PySpark for Data Science - I : Fundamentals
Go to main | Course Page
Section 1: Introduction to PySpark
- What is PySpark
- Download Resources
- Why every Data Engineer and Data Scientist should learn PySpark?
- How PySpark works?
Section 2: The Spark session and Spark Dataframes
- What is Spark session and How to create one?
- Configuring a SparkSession
- How to create PySpark Dataframe from various sources?
- How to use SQL on PySpark Dataframe?
Section 3: PySpark Data Wrangling Techniques
- Selecting Columns
- Dropping Columns
- Renaming Columns
- Select and Filter rows
- How to sort PySpark DataFrame
Section 4: Aggregation and custom methods
- Data Aggregation
- Extending Spark using UDF's
- Exercise PySpark UDF
- Pandas UDF Custom Functions
- Pandas UDAF User defined aggregation function
Section 5: Joins and Pivoting
- PySpark Joins Part 1
- PySpark Joins Part 2
- How to define Schema of Dataframe?
- PySpark Union
- Pivoting
- How to Unpivot PySpark Dataframe