Every year a large amount of data is generated which needs to be stored and analyzed. Apache Spark allows you to process such big data. The real power and value proposition of Apache Spark is its speed and platform to execute data science tasks. Spark's unique use case is that it combines ETL, batch analytic, real-time stream analysis, machine learning, graph processing, and visualizations to allow data scientists to tackle the complexities that come with raw unstructured data sets. Spark embraces this approach and has the vision to make the transition from working on a single machine to working on a cluster, something that makes data science tasks a lot more agile. So, if you're interested to learn big data processing and execute data science tasks efficiently, then go for this Learning Path.
Packt’s Video Learning Path is a series of individual video products put together in a logical and stepwise manner such that each video builds on the skills learned in the video before it.
The highlights of this Learning Path are:
Let's take a quick look at your learning journey. This Learning Path starts off by explaining the basics of Spark API and its architecture in detail. You will then learn about data mining and data cleaning. You will also learn to analyze data by writing actual jobs.
Next, you will learn about the Apache Pig, Hive, and HQL. You will then learn to use the machine learning toolkit available in Apache Spark. You will also build an anomaly detector and compose Spark ML stages into ML pipeline and then use collaborative filtering to create a recommendation engine.
Moving ahead, you will learn how to handle big amount of unbounded infinite streams of data and draw conclusions from it. You will be glanced through some common problems of event stream processing such as sorting, watermarks, deduplication, and keeping state. You will also implement streaming processing using Spark streaming and analyze traffic on a web page in real time. Next, you will learn how to deal with critical aspects while working with streaming API. Finally, you will learn to perform operations using the powerful library of graph algorithms in Spark.
By the end of this Learning Path, you will be able to do all your data science tasks in a very visual way, comprehensive and appealing for business and other stakeholders.
Meet Your Expert(s):
We have the best work of the following esteemed author(s) to ensure that your learning journey is smooth: