Skip to content

Initial Setup and Core Functionality Implementation for Project Apero

Alex Rudaev requested to merge github/fork/HlexNC/main into main

Summary

This pull request establishes the foundational structure and implements the core functionalities for Project Apero, an advanced recommendation system integrated with a chatbot. The changes encompass the configuration of Docker environments, refinement of the .gitignore file, comprehensive documentation, and the development of essential components for data handling, machine learning modeling, and user interface.

Key Changes

  1. Git Configuration

    • .gitignore Update: Streamlined the .gitignore file by removing numerous patterns and retaining essential exclusions related to virtual environments, configuration files, and sensitive data. Added specific ignores for Streamlit secrets and model outputs.
  2. Docker Setup

    • Dockerfiles: Introduced Dockerfile for the Streamlit application and Dockerfile.train for the training environment, ensuring consistent and reproducible deployments.
    • Docker Compose: Added docker-compose.yml to orchestrate multi-container setups, including services for Rasa, the action server, Streamlit app, and Duckling, facilitating seamless integration and communication between components.
  3. Documentation

    • README.md: Overhauled the README to include a project banner, detailed table of contents, comprehensive sections on introduction, features, installation, usage, chatbot integration, data handling, modeling, Docker setup, project structure, licensing, and contact information.
    • Additional Documentation: Created Project_Outline.md, Project_Roadmap.md, Requirements_Specification.md, and System_Design.md within the docs/ directory to provide in-depth insights into project planning, requirements, and architectural design.
  4. Rasa Chatbot Integration

    • Configuration Files: Added config.yml, domain.yml, and credentials.yml to configure Rasa’s NLU pipeline, domain definitions, and credentials for various platforms.
    • Custom Actions: Developed custom action classes in actions/actions.py to handle recommendation generation and data analysis display, enhancing the chatbot’s functionality.
    • Action Server Setup: Included actions/Dockerfile and related setup to containerize the Rasa action server, ensuring it operates seamlessly within the Docker ecosystem.
  5. Streamlit Application Development

    • Main Application: Implemented src/app.py to set up the Streamlit frontend with navigation for Home, Data Analysis, Recommendations, and Chatbot sections.
    • Chatbot Module: Created src/chatbot/rasa_chatbot.py to manage interactions between the Streamlit app and the Rasa chatbot, enabling real-time user assistance.
    • Data Handling Modules: Added src/data/data_loader.py and src/data/data_preprocessor.py to facilitate data loading and preprocessing tasks, ensuring the dataset is clean and ready for analysis and modeling.
  6. Machine Learning Modeling

    • Recommendation Model: Developed src/models/recommendation_model.py to structure the recommendation engine using Scikit-Learn’s Random Forest and Support Vector Machine classifiers, laying the groundwork for personalized suggestions based on user inputs.
    • Model Training Script: Included train_and_rename.sh to automate the training process, handle model validation, and manage model file naming conventions post-training.
  7. Data Integration

    • Dataset Inclusion: Added data/raw/healthcare-dataset-stroke-data.csv as the primary dataset for training and evaluation, providing a realistic basis for recommendation and prediction tasks.
    • Data Augmentation Script: Implemented dir_to_json.py to convert directory structures into JSON format, aiding in data management and processing workflows.
  8. Additional Enhancements

    • Secrets Example: Provided .streamlit/secrets.toml.example to guide the configuration of sensitive credentials required for Streamlit.
    • Banner Image: Added banner.png to visually represent the project in the README.

Impact

These updates lay a solid foundation for Project Apero, enabling efficient development, deployment, and scalability. The Docker configurations ensure environment consistency, while the comprehensive documentation facilitates clear communication and collaboration among team members. The integration of Rasa enhances user interaction capabilities, and the structured approach to data handling and modeling sets the stage for robust recommendation functionalities.

Testing

  • Docker Containers: Successfully built and launched all Docker services using docker-compose.yml, ensuring inter-service communication and functionality.
  • Rasa Chatbot: Verified that the chatbot responds accurately to predefined intents and integrates seamlessly with the Streamlit frontend.
  • Streamlit Application: Launched the Streamlit app locally, navigating through all sections to confirm the interface and functionalities operate as expected.
  • Data Processing: Tested data loading and preprocessing scripts with the included dataset to ensure data integrity and readiness for modeling.

Additional Notes

  • Placeholder Tasks: Several TODOs have been identified within the codebase to guide ongoing development, such as implementing specific data preprocessing steps and refining model training processes.
  • Security Considerations: Ensure that sensitive information, such as API keys and secrets, are managed securely and not exposed in version control. Utilize environment variables and secure storage practices.
  • Future Enhancements: Plan to expand the chatbot’s capabilities, enhance the recommendation algorithms, and incorporate additional data sources to enrich the system’s functionality.

Reviewer Notes:

Please ensure all dependencies are installed as specified in the requirements.txt and that Docker is properly configured on your local machine before testing the application. For any issues or further enhancements, refer to the documentation within the docs/ directory or contact the project maintainers.

Merge request reports

Loading