PySpark for Data Science - V : ML Pipelines
Go to main | Course Page
Section 1: Recap Decision Trees
- Reading Decision Tree
- Download Resources
- How Decision Tree Works?
Section 2: Build Decision Trees in PySpark
- Import required libraries and initialize SparkSession
- Load the dataset
- Prepare the data
- Building the Decision TreeClassifier model
- Evaluating the model on test data
- Feature Importance
- Improve the model (optional)
Section 3: Tuning the Tree with Pipelines
- Creating a Pipeline & Hyperparameter Tuning
Section 4: Self Assessment
- Random Forest Approach
- Gradient Boosting
- Compare results between 4 approaches
Section 5: XGBoost model using PySpark
- The problem with XGBoost for PySpark
- Install XGBoost in PySpark
- Run XGBoost in PySpark