PySpark for Data Science - I : Fundamentals

Go to main | Course Page

Section 1: Introduction to PySpark

  1. What is PySpark
  2. Download Resources
  3. Why every Data Engineer and Data Scientist should learn PySpark?
  4. How PySpark works?

Section 2: The Spark session and Spark Dataframes

  1. What is Spark session and How to create one?
  2. Configuring a SparkSession
  3. How to create PySpark Dataframe from various sources?
  4. How to use SQL on PySpark Dataframe?

Section 3: PySpark Data Wrangling Techniques

  1. Selecting Columns
  2. Dropping Columns
  3. Renaming Columns
  4. Select and Filter rows
  5. How to sort PySpark DataFrame

Section 4: Aggregation and custom methods

  1. Data Aggregation
  2. Extending Spark using UDF's
  3. Exercise PySpark UDF
  4. Pandas UDF Custom Functions
  5. Pandas UDAF User defined aggregation function

Section 5: Joins and Pivoting

  1. PySpark Joins Part 1
  2. PySpark Joins Part 2
  3. How to define Schema of Dataframe?
  4. PySpark Union
  5. Pivoting
  6. How to Unpivot PySpark Dataframe
Report abuse