PySpark for Data Science - IV : Machine Learning
Go to main | Course Page
Section 1: Linear Regression
- Why Pyspark for ML when we have scikit-learn?
- Download Resources
- Import libraries and init SparkSession
- Using VectorAssembler to prepare data
- Build the linear regression model
- Model Summary
- Make predictions and evaluate the model
- Analyze feature importance
- Improve the model (optional)
- Save and load the model (optional)
Section 2: Logistic Regression
- Setup and Load Dataset
- Prepare the data
- Building the Logistic Regression model
- Evaluating the model on test data
Section 3: Ridge Regression
- Setup and Load
- Prepare the data
- Creating a Ridge Regression model
- Hyperparameter tuning
- Inspect the model coefficients and intercept
- Evaluating the model
Section 4: LASSO Regression
- Import required libraries and initialize SparkSession
- Prepare the data
- Build Lasso Regression Model
- Hyperparameter tuning
- Inspect the model coefficients and intercept
- Evaluating the model