Changes

Mykyta Kostohryz · 1077b030
--- a/Model-training.md
+++ b/Model-training.md
@@ -18,7 +18,16 @@
 - **Random Under-Sampling (Optional):**
  - After SMOTE, random under-sampling can be applied to reduce the size of the majority class, ensuring better balance between classes.

-### Step 3: Model Selection and Training
+### Step 3: Dimensionality Reduction and Feature Selection
+- **Principal Component Analysis (PCA):**
+  - PCA is applied to reduce the dimensionality of the feature space while preserving the variance in the data.
+  - This technique helps to avoid overfitting, speeds up model training, and can improve model performance when dealing with highly correlated features.
+  - The number of components can be selected based on the amount of variance to retain (e.g., 95% variance retention).
+- **Recursive Feature Elimination (RFE):**
+  - RFE is used to select the most important features by recursively removing features and training the model to identify the best subset.
+  - This method helps improve model interpretability and can reduce overfitting by removing redundant or irrelevant features.
+
+### Step 4: Model Selection and Training
 - **Models Trained:**
  1. **Support Vector Machine (SVM):** Effective for high-dimensional spaces and performs well for binary classification tasks.
  2. **Random Forest (RF):** A robust ensemble method that aggregates the predictions of multiple decision trees, ensuring better generalization.
@@ -38,7 +47,7 @@
    - **kNN:** Number of neighbors (n_neighbors) and distance metric.
    - **Ensemble Model:** Hyperparameters for both SVM and Random Forest models are optimized separately before being combined in the ensemble method.

-### Step 4: Model Evaluation
+### Step 5: Model Evaluation
 - After training, models are evaluated using multiple metrics:
  - **Accuracy:** Overall correctness of the model.
  - **Classification Report:** Provides precision, recall, F1-score for each class.
@@ -49,15 +58,11 @@
 - **Evaluation Tools:**
  - **Precision-Recall and ROC Curves:** Plots showing the trade-off between precision and recall, and the model’s performance across different classification thresholds.

-### Step 5: Model Deployment
+### Step 6: Model Deployment
 - **Saving the Model:**
  - The trained models are serialized and saved using `joblib` for later deployment.
  - Saved models are stored in a directory (`./runs/`) for easy access during predictions.

-### Step 6: Performance Monitoring and Updates
- **Monitoring:** The models' performance is monitored over time, with periodic evaluations on new data.
- **Model Updating:** Based on new data or performance degradation, models can be retrained and updated with improved features or hyperparameters.
-
 ---

 This pipeline ensures that the models are trained on clean, balanced data with optimized hyperparameters, while providing robust evaluation and monitoring mechanisms to guarantee the model’s long-term performance.
\ No newline at end of file