Extract, Transform, Load (ETL) is a process to process various data sources to be targeted data sources. ETL is one of required skill in data science to implement pre-processing and/or post-processing. This workshop is designed for anyone who wants to improve ETL skills.
The workshop will focus on the following data sources
We start to learn for basic I/O files and directories. We can copy and delete files or directories. Next, we explore how to access various file types such as Text, CSV, JSON, and XML. In addition, we access remote data source over website and server-based S3 protocol.
We learn how to work with RDBMS database with Python. We use RBDMS database engines such as SQLite, MySQL, SQL Server and PostgreSQL. We perform CRUD (Create, Read, Update, Delete). We also access database table from Python Pandas. Then, we can convert Python Pandas Dataframe into database table.
We can leverage ETL with NoSQL database engines. We will work with MongoDB, Redis and Apache Cassandra. We perform CRUD (Create, Read, Update, Delete) on these NoSQL database engines. We also access NoSQL database from Python Pandas. Then, we can convert Python Pandas Dataframe into NoSQL database.
Last, we implement ETL Python program. We have three case studies to show how ETL work with Python.
This workshop needs a basic Python programming to follow all hands-on-labs. Internet access is needed when we’re installing additional Python libraries.