Open in app

Sign In

Write

Sign In

Mastodon
Maria Karanasou
Maria Karanasou

1.4K Followers

Home

About

Published in

MLearning.ai

·Pinned

Explaining the predictions— Shapley Values with PySpark

Interpreting Isolation Forest’s predictions — and not only — The problem: how to interpret Isolation Forest’s predictions More specifically, how to tell which features are contributing more to the predictions. Since Isolation Forest is not a typical Decision Tree (see Isolation Forest characteristics here), after some research, I ended up with three possible solutions: 1) Train on the same dataset another similar algorithm that has feature importance…

Machine Learning

10 min read

Machine Learning Interpretability — Shapley Values with PySpark
Machine Learning Interpretability — Shapley Values with PySpark
Machine Learning

10 min read


Jan 5

Minimizing the soul-sucking part of working with Spark, one bug at a time

PySpark debugging pt. 2 — 1. Invalid log directory Error: invalid log directory /usr/local/spark/work/app- 20201203195256-0001/2/ This can be a bit tricky, check your firewall rules and make sure all of your nodes have access to the storage being used. So, master sees workers and vice versa.

Pyspark

5 min read

Minimizing the soul-sucking part of working with Spark, one bug at a time
Minimizing the soul-sucking part of working with Spark, one bug at a time
Pyspark

5 min read


Aug 6, 2021

Bot-hunting with Baskerville

Leverage Machine Learning to defend against DDoS Manual identification and mitigation of (DDoS) attacks on websites is a difficult and time-consuming task with many challenges. This is where Baskerville comes in. Baskerville is an open-source Security Analytics Engine, a system to identify the attacks (currently) directed to Deflect protected websites as they happen and give the infrastructure…

6 min read

Bot-hunting with Baskerville
Bot-hunting with Baskerville

6 min read


May 13, 2021

Thanks for this wonderful resource!
9
1

Lucas Rodés-Guirao

Thank you for your kind words!

Thank you for your kind words! And thank you so much for sharing such an amazing project! Watching... :)

1 min read

1 min read


Apr 21, 2021

Hi Maria! It is very helpful.
1

Nandhini Devi Kaliaperumal

Very happy to know the article and code were useful to you.

Very happy to know the article and code were useful to you. Of course you can use the code, thanks for asking!

1 min read

1 min read


Apr 21, 2021

… a unique hash (e.g. MD5, SHA256, etc.) of the given URL. The hash can then be encoded for display. This encoding could be base36 ([a-z ,0–9]) or base62 ([A-Z, a-z, 0–9]). If we add + and /, we can use Base64 encoding. A reasonable question would be, “What should be the length of the short key? 6, 8, or 10 characters…

How Would You Design TinyURL and Instagram?
959
11

The Educative Team

Great reasoning, just a small question about the other allowed characters like `-`, `?`,

Great reasoning, just a small question about the other allowed characters like `-`, `?`, `_`, `%`, `=` ? Also, would you do anything differently if there was a requirement to include urls in other languages

1 min read

1 min read


Apr 21, 2021

The best type of database to use would be a NoSQL database store like DynamoDB or Cassandra since we are storing billions of rows with no relationships between the objects.

How Would You Design TinyURL and Instagram?
959
11

The Educative Team

First of all, I really enjoyed your thorough analysis, excellent article, thanks!

First of all, I really enjoyed your thorough analysis, excellent article, thanks! For the highlighted part, I of course agree about the NoSQL case, but the `no relationship` part is not exactly true, right? I mean there is the UserID that links the two tables, it is just going to be handled differently.

1 min read

1 min read


Published in

The Startup

·Nov 28, 2020

A Spark Attack

Time to check your spark clusters — It is well known — or should be — that spark is not secured by default. It is right there in the docs Security in Spark is OFF by default So you should be well aware that you’ll need to put the effort to secure your cluster. And there are…

Apache Spark

3 min read

A Spark Attack
A Spark Attack
Apache Spark

3 min read


Published in

An Idea (by Ingenious Piece)

·Aug 24, 2020

COVID-19: Is G6PDd a major risk factor?

Research and awareness needed TL;DR It seems that more and more people are agreeing that G6PDd can be a risk factor for COVID-19, not only in terms of the medication that is used to combat the virus, but regarding one’s susceptibility to the virus and the severity of its side-effects…

Covid-19

6 min read

COVID-19: Is G6PDd a major risk factor?
COVID-19: Is G6PDd a major risk factor?
Covid-19

6 min read


Published in

Towards Data Science

·Mar 24, 2020

Isolation Forest and Pyspark part 2

Lessons learned — So, after a few runs with the PySpark ml implementation of Isolation Forest presented here, I stumbled upon a couple of things and I thought I’d write about them so that you don’t waste the time I wasted troubleshooting. Only Dense Vectors In the previous article, I used VectorAssembler to gather the feature…

Anomaly Detection

2 min read

Isolation Forest and Pyspark
Isolation Forest and Pyspark
Anomaly Detection

2 min read

Maria Karanasou

Maria Karanasou

1.4K Followers

A mom and a Software Engineer who loves to learn new things & all about ML & Big Data. Buy me a coffee to help me keep going buymeacoffee.com/mkaranasou

Following
  • The Good Men Project

    The Good Men Project

  • Anastasia Petrenko

    Anastasia Petrenko

  • Rosemary Nonny Knight - The Money Minister

    Rosemary Nonny Knight - The Money Minister

  • Scott Ninneman

    Scott Ninneman

  • Loic Joachim

    Loic Joachim

See all (1,208)

Help

Status

Writers

Blog

Careers

Privacy

Terms

About

Text to speech