Initial Setup and Core Functionality Implementation for Project Apero
Summary
This pull request establishes the foundational structure and implements the core functionalities for Project Apero, an advanced recommendation system integrated with a chatbot. The changes encompass the configuration of Docker environments, refinement of the .gitignore
file, comprehensive documentation, and the development of essential components for data handling, machine learning modeling, and user interface.
Key Changes
-
Git Configuration
-
.gitignore Update: Streamlined the
.gitignore
file by removing numerous patterns and retaining essential exclusions related to virtual environments, configuration files, and sensitive data. Added specific ignores for Streamlit secrets and model outputs.
-
.gitignore Update: Streamlined the
-
Docker Setup
-
Dockerfiles: Introduced
Dockerfile
for the Streamlit application andDockerfile.train
for the training environment, ensuring consistent and reproducible deployments. -
Docker Compose: Added
docker-compose.yml
to orchestrate multi-container setups, including services for Rasa, the action server, Streamlit app, and Duckling, facilitating seamless integration and communication between components.
-
Dockerfiles: Introduced
-
Documentation
- README.md: Overhauled the README to include a project banner, detailed table of contents, comprehensive sections on introduction, features, installation, usage, chatbot integration, data handling, modeling, Docker setup, project structure, licensing, and contact information.
-
Additional Documentation: Created
Project_Outline.md
,Project_Roadmap.md
,Requirements_Specification.md
, andSystem_Design.md
within thedocs/
directory to provide in-depth insights into project planning, requirements, and architectural design.
-
Rasa Chatbot Integration
-
Configuration Files: Added
config.yml
,domain.yml
, andcredentials.yml
to configure Rasa’s NLU pipeline, domain definitions, and credentials for various platforms. -
Custom Actions: Developed custom action classes in
actions/actions.py
to handle recommendation generation and data analysis display, enhancing the chatbot’s functionality. -
Action Server Setup: Included
actions/Dockerfile
and related setup to containerize the Rasa action server, ensuring it operates seamlessly within the Docker ecosystem.
-
Configuration Files: Added
-
Streamlit Application Development
-
Main Application: Implemented
src/app.py
to set up the Streamlit frontend with navigation for Home, Data Analysis, Recommendations, and Chatbot sections. -
Chatbot Module: Created
src/chatbot/rasa_chatbot.py
to manage interactions between the Streamlit app and the Rasa chatbot, enabling real-time user assistance. -
Data Handling Modules: Added
src/data/data_loader.py
andsrc/data/data_preprocessor.py
to facilitate data loading and preprocessing tasks, ensuring the dataset is clean and ready for analysis and modeling.
-
Main Application: Implemented
-
Machine Learning Modeling
-
Recommendation Model: Developed
src/models/recommendation_model.py
to structure the recommendation engine using Scikit-Learn’s Random Forest and Support Vector Machine classifiers, laying the groundwork for personalized suggestions based on user inputs. -
Model Training Script: Included
train_and_rename.sh
to automate the training process, handle model validation, and manage model file naming conventions post-training.
-
Recommendation Model: Developed
-
Data Integration
-
Dataset Inclusion: Added
data/raw/healthcare-dataset-stroke-data.csv
as the primary dataset for training and evaluation, providing a realistic basis for recommendation and prediction tasks. -
Data Augmentation Script: Implemented
dir_to_json.py
to convert directory structures into JSON format, aiding in data management and processing workflows.
-
Dataset Inclusion: Added
-
Additional Enhancements
-
Secrets Example: Provided
.streamlit/secrets.toml.example
to guide the configuration of sensitive credentials required for Streamlit. -
Banner Image: Added
banner.png
to visually represent the project in the README.
-
Secrets Example: Provided
Impact
These updates lay a solid foundation for Project Apero, enabling efficient development, deployment, and scalability. The Docker configurations ensure environment consistency, while the comprehensive documentation facilitates clear communication and collaboration among team members. The integration of Rasa enhances user interaction capabilities, and the structured approach to data handling and modeling sets the stage for robust recommendation functionalities.
Testing
-
Docker Containers: Successfully built and launched all Docker services using
docker-compose.yml
, ensuring inter-service communication and functionality. - Rasa Chatbot: Verified that the chatbot responds accurately to predefined intents and integrates seamlessly with the Streamlit frontend.
- Streamlit Application: Launched the Streamlit app locally, navigating through all sections to confirm the interface and functionalities operate as expected.
- Data Processing: Tested data loading and preprocessing scripts with the included dataset to ensure data integrity and readiness for modeling.
Additional Notes
- Placeholder Tasks: Several TODOs have been identified within the codebase to guide ongoing development, such as implementing specific data preprocessing steps and refining model training processes.
- Security Considerations: Ensure that sensitive information, such as API keys and secrets, are managed securely and not exposed in version control. Utilize environment variables and secure storage practices.
- Future Enhancements: Plan to expand the chatbot’s capabilities, enhance the recommendation algorithms, and incorporate additional data sources to enrich the system’s functionality.
Reviewer Notes:
Please ensure all dependencies are installed as specified in the requirements.txt
and that Docker is properly configured on your local machine before testing the application. For any issues or further enhancements, refer to the documentation within the docs/
directory or contact the project maintainers.