Skip to content
Snippets Groups Projects
A

Assistance_Systems

Khan, Asif, 22300224

Netflix Content Analysis

https://mygit.th-deg.de/assistance_systems/gitlab-profile

https://mygit.th-deg.de/assistance_systems/gitlab-profile/-/wikis/home

Project Description

Netflix Dataset Analysis and Prediction is a system designed to explore, analyze, and build predictive models on the Netflix catalog dataset. The application provides insights into content trends and implements machine learning models to make recommendations or predictions.

Installation

Follow the steps below to set up the project locally:

Clone the Repository: git clone https://mygit.th-deg.de/assistance_systems/gitlab-profile.git cd gitlab-profile

Create a Virtual Environment: It's recommended to use a virtual environment to manage dependencies.

python3 -m venv venv

source venv/bin/activate  # On Windows: venv\Scripts\activate

Install Dependencies: Install the required Python packages using pip.

pip install -r requirements.txt

Verify Installation: Ensure all packages are installed correctly.

pip freeze

Data

The Netflix dataset was sourced from Kaggle. It contains details about movies and TV shows available on Netflix, including:

show_id - Unique ID of Each content type - Movie or Tv Show title - Title/ Name of the Movie or TV Show director - Director cast - Cast in the movie or tv show country - Country of content date_added - Content added on Netflix release_year - Year of Release rating - Age rating duration - Duration of the movie or tv show listed_in - Genre description - Description of the movie or tv show

Dependencies

Python: 3.9.6 Streamlit: 1.41.1 Scikit-learn: 1.1.3 Rasa: 3.6.20 Flask: Latest version Seaborn: Latest version

Basic Usage

After downloading the project files in a project folder, do the following steps:

Prerequisite: You have Install all the required dependencies (pip install -r requirements.txt)

  1. Navigate to the rasa directory by running:

    cd rasa

  2. After changing the directory, Train the Rasa Model by running:

    rasa train

  3. Start the Rasa server with the following command:

    rasa run

  4. Start the Rasa actions server in a separate terminal after changing directory(cd rasa):

    rasa run actions

  5. Start the Streamlit application in a separate terminal using the following command in the root directory of the project:

    streamlit run main.py

The webpage shows the main pages in the navigation bar on the left. All the entries are : about_me add_and_apply_model algorithm_selection augmentation chatbot data_metrics feature_engineering model_application model_training preprocessing viszualization chatbot

Implementation of the Requests

Data Handling

  1. Data Loading: The Netflix dataset is loaded in main.py using pandas for efficient data manipulation and exploration.

  2. Data Preprocessing: Implemented in preprocessing.py to clean the dataset by handling missing values, encoding categorical variables (e.g., genres, directors), and scaling numerical features (e.g., duration, release year).

  3. Data Augmentation: Managed in augmentation.py, adding synthetic data to enhance the dataset for better model performance and robustness.

Machine Learning Models

  1. Logistic Regression: Used in add_and_apply_model.py to predict the primary genre of Netflix titles based on features like director, duration, and release year. This model is lightweight and interpretable, making it ideal for quick predictions.

  2. Random Forest Classifier: Utilized in model_training.py for more complex, non-linear predictions. It handles categorical and numerical data effectively, improving genre prediction accuracy.

Model Evaluation: Model performance is evaluated using metrics such as accuracy, precision, recall, and F1-score in model_training.py and add_and_apply_model.py. Confusion matrices and classification reports provide insights into performance.

User Interface

Streamlit Pages:

  1. about_me.py: Displays project information and developer details.
  2. add_and_apply_model.py: Implements logistic regression and Random Forest for genre prediction and custom predictions.
  3. algorithm_selection.py: Allows users to choose between different machine learning models.
  4. augmentation.py: Adds synthetic data for enhanced model performance.
  5. chatbot.py: Integrates the Rasa chatbot for conversational interactions.
  6. data_metrics.py: Provides data insights and key metrics.
  7. feature_engineering.py: Handles feature extraction and transformations.
  8. model_application.py: Applies the trained models to the dataset.
  9. model_training.py: Trains logistic regression and Random Forest models.
  10. preprocessing.py: Cleans and preprocesses the dataset.
  11. viszualization.py: Visualizes trends, distributions, and model predictions.

Input Widgets:

Integrated into pages like algorithm_selection.py and add_and_apply_model.py to capture user inputs, such as director name, duration, and release year, for custom predictions or visualizations.

Chatbot Integration

Rasa Implementation: Built using domain.yml, data/nlu.yml, data/stories.yml, and actions/actions.py. The chatbot provides conversational recommendations, explains model predictions, and guides users through the platform.

REST API Communication: Handled in chatbot.py, which connects the Streamlit interface to the Rasa server for real-time interactions.

Dialog Management: Structured in Rasa's story files, allowing users to ask for recommendations, clarify predictions, or navigate functionalities in a conversational manner.

Visualizations

Data Insights: Implemented in data_metrics.py and visualization.py, showcasing trends like genre distributions, release year patterns, and duration analysis through bar charts, line plots, and heatmaps.

Model Predictions: Plotted in add_and_apply_model.py and model_training.py using bar charts, confusion matrices, and scatter plots to display predicted vs. actual genres.

Custom Predictions: Highlighted in add_and_apply_model.py using interactive graphs, including legends to distinguish between predicted and actual genres.

This comprehensive implementation integrates both Logistic Regression for interpretability and Random Forest for enhanced predictive performance, with a user-friendly Streamlit interface and an interactive chatbot powered by Rasa for seamless user experience.

Use Cases

Explore the Use Cases Video : https://drive.google.com/file/d/1C8KOnIdXIKIuEapnbXtiCTU6mKuw3n4u/view?usp=share_link

Wiki Reference

The Wiki serves as a detailed documentation repository, containing project-specific information. User Persona: Detailed in the "User Persona" chapter of the Wiki. Use Cases: Defined in the "Use Cases" chapter, specifying user interactions and functional expectations.

Work done

All work by Asif Khan.