Changes

Adham Beshr · b86854f4
--- a/Data.md
+++ b/Data.md
@@ -2,6 +2,7 @@
 title: Data
 ---

+
 https://mygit.th-deg.de/ab11885/watch-wise/-/raw/main/Images/Dataset_sample_-_2.png?ref_type=heads

 ## Data Chapter
@@ -65,6 +66,34 @@ https://mygit.th-deg.de/ab11885/watch-wise/-/raw/main/Images/Dataset_sample_-_2.
 #### 5.1. Generating Fake Data
 - A synthetic dataset was created by adding **25% fake data** using the **Faker** library to simulate movie attributes like genre, rating, and votes. The fake data was introduced to test model performance and robustness. 

+- ```python
+       def generate_fake_data(real_data, fake_percentage=0.25):
+          data = []
+          genres = real_data['Genre'].dropna().unique()
+          num_samples = int(len(real_data) * fake_percentage)
+  
+          for _ in range(num_samples):
+          title = fake.bs().title()
+          genre = random.choice(genres)
+          description = fake.sentence(nb_words=12)
+          director = fake.name()
+          actors = fake.name() + ', ' + fake.name()
+          year = random.randint(2000, 2023)
+          runtime = random.randint(80, 180)
+          rating = round(random.uniform(1, 10), 1)
+          votes = random.randint(50000, 1000000)
+          revenue = round(random.uniform(10, 500), 2)
+          metascore = random.randint(0, 100)
+  
+          data.append([title, genre, description, director, actors, year, runtime, rating, votes, revenue, metascore])
+      
+      columns = ['Title', 'Genre', 'Description', 'Director', 'Actors', 'Year', 
+                 'Runtime (Minutes)', 'Rating', 'Votes', 'Revenue (Millions)', 'Metascore']
+      fake_df = pd.DataFrame(data, columns=columns)
+      
+      return fake_df
+
+
 #### 5.2. Impact of Fake Data
 - The addition of fake data was evaluated by comparing the performance of machine learning models with and without fake data. The performance drop or improvement was analyzed based on the **Mean Squared Error (MSE)** and **R²** scores.

@@ -73,5 +102,3 @@ https://mygit.th-deg.de/ab11885/watch-wise/-/raw/main/Images/Dataset_sample_-_2.
 - The split was done using the `train_test_split()` function from `sklearn.model_selection`.

 --- 
--- 
---