4.1  99 reviews on Udemy

Scalable Data Analysis in Python with Dask

Build high-performance, distributed, and parallel applications in Dask
Course from Udemy
 503 students enrolled
 en
Understand the concept of Block algorithms and how Dask leverages it to load large data
Implement various example using Dask Arrays, Bags, and Dask Data frames for efficient parallel computing
Combine Dask with existing Python packages such as NumPy and pandas
See how Dask works under the hood and the various in-built algorithms it has to offer
Leverage the power of Dask in a distributed setting and explore its various schedulers
Implement an end-to-end Machine Learning pipeline in a distributed setting using Dask and scikit-learn
Use Dask Arrays, Bags, and Dask Data frames for parallel and out-of-memory computations

Data analysts, Machine Learning professionals, and data scientists often use tools such as Pandas, Scikit-Learn, and NumPy for data analysis on their personal computer. However, when they want to apply their analyses to larger datasets, these tools fail to scale beyond a single machine, and so the analyst is forced to rewrite their computation.

If you work on big data and you’re using Pandas, you know you can end up waiting up to a whole minute for a simple average of a series. And that’s just for a couple of million rows!

In this course, you’ll learn to scale your data analysis. Firstly, you will execute distributed data science projects right from data ingestion to data manipulation and visualization using Dask. Then, you will explore the Dask framework. After, see how Dask can be used with other common Python tools such as NumPy, Pandas, matplotlib, Scikit-learn, and more.

You’ll be working on large datasets and performing exploratory data analysis to investigate the dataset, then come up with the findings from the dataset. You’ll learn by implementing data analysis principles using different statistical techniques in one go across different systems on the same massive datasets.

Throughout the course, we’ll go over the various techniques, modules, and features that Dask has to offer. Finally, you’ll learn to use its unique offering for machine learning, using the Dask-ML package. You’ll also start using parallel processing in your data tasks on your own system without moving to the distributed environment.

About the Author

Mohammed Kashif works as a data scientist at Nineleaps, India, dealing mostly with graph data analysis. Prior to this, he worked as a Python developer at Qualcomm. He completed his Master's degree in computer science at IIIT Delhi, with a specialization in data engineering. His areas of interest include recommender systems, NLP, and graph analytics.

In his spare time, he likes to solve questions on StackOverflow and help debug other people out of their misery. He is also an experienced teaching assistant with a demonstrated history of working in the higher-education industry.

Scalable Data Analysis in Python with Dask
$ 109.99
per course
Also check at

FAQs About "Scalable Data Analysis in Python with Dask"

About

Elektev is on a mission to organize educational content on the Internet and make it easily accessible. Elektev provides users with online course details, reviews and prices on courses aggregated from multiple online education providers.
DISCLOSURE: This page may contain affiliate links, meaning when you click the links and make a purchase, we receive a commission.

SOCIAL NETWORK