This course runs for a duration of 2 Days.
The class will run daily from 10 AM ET to 6 PM ET.
Class Location: Virtual LIVE Instructor Led - Virtual Live Classroom.
When you feel constrained by the computing power of a single computer, you can leverage the Apache Spark platform's massively parallel processing capabilities using PySpark, a Python-based language supported by Spark. Along with introducing PySpark, this course covers Spark Shell to interactively explore and manipulate data. Spark SQL is introduced for a uniform programming API to work with structured data. The course ends with covering Pandas for data manipulation and analysis and data visualization with seaborn.
Objectives
Audience
Chapter 1. Introduction to Apache Spark
Chapter 2. The Spark Shell
Chapter 3. Introduction to Spark SQL
Chapter 4. Practical Introduction to Pandas
Chapter 5. Data Visualization with seaborn in Python
Chapter 6. (Optional) Quick Introduction to Python for Data Engineers
Lab Exercises
Knowledge of SQL, familiarity with Python (or the ability to learn the basics of a new language).