PySpark for Data Science - IV : Machine Learning

Go to main | Course Page

Section 1: Linear Regression

  1. Why Pyspark for ML when we have scikit-learn?
  2. Download Resources
  3. Import libraries and init SparkSession
  4. Using VectorAssembler to prepare data
  5. Build the linear regression model
  6. Model Summary
  7. Make predictions and evaluate the model
  8. Analyze feature importance
  9. Improve the model (optional)
  10. Save and load the model (optional)

Section 2: Logistic Regression

  1. Setup and Load Dataset
  2. Prepare the data
  3. Building the Logistic Regression model
  4. Evaluating the model on test data

Section 3: Ridge Regression

  1. Setup and Load
  2. Prepare the data
  3. Creating a Ridge Regression model
  4. Hyperparameter tuning
  5. Inspect the model coefficients and intercept
  6. Evaluating the model

Section 4: LASSO Regression

  1. Import required libraries and initialize SparkSession
  2. Prepare the data
  3. Build Lasso Regression Model
  4. Hyperparameter tuning
  5. Inspect the model coefficients and intercept
  6. Evaluating the model
Report abuse