Update Model Selection authored by Asif Khan's avatar Asif Khan
# Model Evaluation and Selection
The Netflix Dataset Model Application was evaluated using **Linear Regression** and **Lasso Regression**.
The **Netflix Dataset Model Application** was evaluated using **Logistic Regression** and **Random Forest Classifier**.
## **Evaluation Metrics**
1. **Loss Functions**:
- R² (Coefficient of Determination) → Score Function
2. **Cross-Validation Score**:
- See [Scikit-learn: Cross Validation](https://scikit-learn.org/stable/modules/cross_validation.html)
## Evaluation Metrics
## **Comparison Results**
The return values of **R² Score** and **Cross-Validation Score** were analyzed for both models:
- **Linear Regression** generally performed better in terms of both metrics.
- **Lasso Regression**, while effective, showed slightly lower performance due to its regularization, which reduces overfitting but may compromise predictive accuracy for this dataset.
- **Classification Metrics:**
- **Accuracy (Score Function):** Measures the proportion of correctly predicted instances out of all instances.
- **Classification Report:** Includes Precision, Recall, and F1-Score for each genre, providing a detailed performance overview.
## **Conclusion**
Based on the evaluation, **Linear Regression** is the better model for predicting the release year of Netflix titles using the dataset, as it consistently outperforms Lasso Regression in terms of R² Score and Cross-Validation Score.
- **Cross-Validation Score:**
- Utilized Scikit-learn's cross-validation to assess model performance across multiple data splits, ensuring reliability and robustness of the results.
- Reference: [Scikit-learn Cross Validation](https://scikit-learn.org/stable/modules/cross_validation.html)
---
\ No newline at end of file
## Comparison Results
The performance of **Logistic Regression** and **Random Forest Classifier** was analyzed based on their **Accuracy Score** and **Cross-Validation Score**:
- **Logistic Regression:**
- **Accuracy Score:** Achieved higher accuracy on the validation set compared to Random Forest.
- **Cross-Validation Score:** Demonstrated consistent and superior performance across all cross-validation folds, indicating robust generalization capabilities.
- **Random Forest Classifier:**
- **Accuracy Score:** While effective, it showed slightly lower accuracy than Logistic Regression in this specific application.
- **Cross-Validation Score:** Exhibited more variability across folds, suggesting potential overfitting issues despite its ensemble nature.
## Conclusion
Based on the evaluation, **Logistic Regression** is the better model for predicting the genre of Netflix titles using the dataset. It consistently outperforms the Random Forest Classifier in terms of both **Accuracy Score** and **Cross-Validation Score**. Additionally, Logistic Regression offers greater interpretability and computational efficiency, making it more suitable for integration into the application. While Random Forest remains a powerful classifier, the simplicity and reliability of Logistic Regression align more closely with the project's objectives of delivering accurate and understandable genre predictions.