diff --git a/Thesis_Docs/Nikkhah_Nasab-Aida-Mastersthesis.pdf b/Thesis_Docs/Nikkhah_Nasab-Aida-Mastersthesis.pdf
index f7c8fb9e63c42f7acb9d64f4454ed8375e141e7a..e8570438bc539248da206371f4ac97954f26f166 100644
Binary files a/Thesis_Docs/Nikkhah_Nasab-Aida-Mastersthesis.pdf and b/Thesis_Docs/Nikkhah_Nasab-Aida-Mastersthesis.pdf differ
diff --git a/Thesis_Docs/main.tex b/Thesis_Docs/main.tex
index 59511c5b453fbe5733dc4da88c46ad9c84d874ee..0fc966e263bc8d980969fc434a57695d8dcda520 100644
--- a/Thesis_Docs/main.tex
+++ b/Thesis_Docs/main.tex
@@ -279,7 +279,14 @@ Haffey et al. (2018) focused on modeling, analyzing, and characterizing periodic
 Recent research has focused on various aspects of enterprise security and malicious activity detection. Oprea et al. (2018) introduced MADE, a security analytics framework designed to enhance threat detection in enterprise environments \cite{oprea2018made} . The framework leverages advanced analytics to detect potential threats by analyzing large volumes of security data, enabling organizations to respond more effectively to cyber incidents. Ukrop et al. (2019) investigated the perception of IT professionals regarding the trustworthiness of TLS certificates, highlighting challenges in assessing certificate legitimacy and its implications for secure communications \cite{ukrop2019will} . In a related study, Vissers et al. (2017) explored the ecosystem of malicious domain registrations within the .eu top-level domain (TLD), providing insights into the strategies used by attackers to exploit domain registration systems for malicious purposes \cite{vissers2017exploring} . Together, these works contribute to the broader understanding of security challenges in modern networks and propose solutions to improve detection and mitigation strategies.
 
 \chapter{Methodology}
-The BAYWATCH framework is a comprehensive methodology designed to identify stealthy beaconing behavior in large-scale enterprise networks. Beaconing, a common behavior in malware-infected hosts, involves periodic communication with a command and control (C\&C) infrastructure. Detecting such behavior is challenging due to the presence of legitimate periodic traffic (e.g., software updates, email polling) and the various strategies employed by malware authors to evade detection. The BAYWATCH framework addresses these challenges through an 8-step filtering approach, which iteratively refines and eliminates legitimate traffic to pinpoint malicious beaconing cases. This chapter provides a detailed explanation of each step in the BAYWATCH methodology.
+This chapter covers a detailed overview of the event log dataset, along with the exploratory data analysis and the preprocessing steps required to ensure the suitability of the data for the study. Furthermore, the steps in the data generation process utilized in this thesis are defined. The chapter also provides an explanation of the BAYWATCH framework, including the various phases and steps involved in detecting beaconing behavior. The methodology section concludes with a description of the evaluation metrics used to assess the performance of the BAYWATCH framework.
+
+\section{Data Strategy Rationale}
+Real data represents actual network traffic, capturing the authentic, complex, and often noisy behavior of users in an enterprise environment. It reflects genuine usage patternsâ€”including legitimate periodic traffic, gaps due to network delays or device outages, and inherent variability caused by diverse applicationsâ€”that naturally emerge during normal operations. For example, noise can be introduced by network retransmissions and latency issues, while missing data often occurs when devices temporarily go offline or due to privacy-driven data filtering. These challenges arise organically and can obscure the detection of malicious beaconing behavior by blending with benign periodic activities.
+
+In contrast, artificial data is generated under controlled conditions. Beaconing behavior is simulated with predetermined parameters (e.g., specific beacon frequencies and controlled jitter ranges), establishing a â€œground truthâ€ scenario where variations can be deliberately introduced. This controlled simulation replicates how challenges like noise and irregular intervals might manifest under different network conditions. By systematically varying these parameters, the sensitivity and robustness of the BAYWATCH framework can be precisely assessed, and the impact of specific challenges on detection accuracy can be thoroughly evaluated.
+
+The combination of both data sources enables validation of the framework in authentic operational conditions (ensuring external validity) while also leveraging controlled simulations to isolate and address specific challenges (ensuring internal validity). This dual approach ensures that the detection mechanism is both practical for real-world deployment and resilient against a broad spectrum of adversarial scenarios.
 
 \section{Real Data Source}
 
@@ -297,30 +304,20 @@ The dataset is structured as a collection of JSON files, with each file containi
 
 The structure of the JSON files is defined by a Document Type Definition (DTD), which ensures consistency and reliability across all entries. Below is an example of the JSON schema used for the dataset:
 
-\begin{lstlisting}[language=json]
-{
-  "$schema": "http://json-schema.org/draft-07/schema#",
-  "type": "object",
-  "properties": {
-    "logdate": {
-      "type": "string",
-      "format": "date-time"
-    },
-    "url_hostname": {
-      "type": "string"
-    },
-    "user": {
-      "type": "string"
-    }
-  },
-  "required": ["logdate", "url_hostname"]
-}
+\begin{lstlisting}[language=json, basicstyle=\small\ttfamily, backgroundcolor=\color{white}]
+    { "$schema": "http://json-schema.org/draft-07/schema#",
+      "type": "object",
+      "properties": {
+        "logdate": { "type": "string", "format": "date-time"},
+        "url_hostname": { "type": "string"},
+        "user": { "type": "string"}},
+      "required": ["logdate", "url_hostname"]}
 \end{lstlisting}
 
 The structured format of the JSON files ensures that each entry is consistent and comprehensive, providing a reliable record of user activities for analysis.
 
 \subsection{Data Collection and Scale}
-The dataset was collected over the course of a single day, specifically a typical Tuesday workday, generating nearly 73 gigabytes of information. This large-scale data collection captures the following details:
+The dataset was collected over the course of a single day, specifically on a typical workday, August 1, 2023 (Tuesday), generating nearly 73 gigabytes of information. This large-scale data collection captures the following details:
 \begin{itemize}
     \item \textbf{Host Information}: The IP addresses of the user devices, enabling the tracking of individual hosts and their activities.
     \item \textbf{Timestamps}: Precise date and time of each user interaction, enabling temporal analysis of browsing patterns.
@@ -335,7 +332,7 @@ To manage and analyze the dataset effectively, a sophisticated data management s
 
 \begin{enumerate}
     \item \textbf{Data Import}: The dataset is imported into InfluxDB using custom Python scripts. These scripts automate the creation of a dedicated "bucket" within InfluxDB, ensuring that the data is organized and stored efficiently.
-    \item \textbf{Schema Implementation}: A predefined schema is applied to enforce data integrity and consistency. This schema ensures that all entries adhere to the same format and standards, facilitating smoother data processing and analysis.
+    \item \textbf{Schema Implementation}: A predefined schema is applied to enforce data integrity and consistency. This schema ensures that all entries adhere to the same format and standards, facilitating smoother data processing and analysis. Entries that do not conform to the predefined format are rejected, and a validation message is generated, indicating that the data has not been imported into the database.
     \item \textbf{Initial Data Analysis}: The dataset is analyzed to understand its behavior, including:
     \begin{itemize}
         \item Observing overall data trends throughout the day.
@@ -346,13 +343,19 @@ To manage and analyze the dataset effectively, a sophisticated data management s
 \end{enumerate}
 
 \subsection{Challenges with Real-World Data}
-The real-world dataset presents several challenges that must be addressed to ensure accurate and reliable analysis:
+
+Analyzing real-world network traffic presents several challenges that must be addressed to ensure accurate detection and analysis of beaconing behavior.
+
 \begin{itemize}
-    \item \textbf{Noise and Variability}: Real-world network traffic is inherently noisy, with random variations in connection timing due to network delays, retransmissions, and other factors. This noise can obscure periodic patterns and complicate the detection of beaconing behavior.
-    \item \textbf{Missing Data}: Devices may go offline or move out of the observation range, resulting in gaps in the data. These gaps can disrupt the detection of periodic behavior and require careful handling during analysis.
-    \item \textbf{Legitimate Periodic Traffic}: Many legitimate applications (e.g., software updates, email polling) exhibit periodic behavior that resembles beaconing. Distinguishing between legitimate and malicious periodic traffic is a key challenge in real-world data analysis.
+    \item \textbf{Noise and Variability:} Real-world network traffic is inherently noisy, with random variations in connection timing due to network delays, retransmissions, and other factors. To mitigate this, preprocessing techniques such as smoothing filters and statistical normalization are applied to reduce variability while preserving essential patterns. Additionally, robust anomaly detection methods help differentiate between normal fluctuations and meaningful periodic signals.
+    
+    \item \textbf{Missing Data:} Devices may go offline or move out of the observation range, resulting in gaps in the data. To handle this, interpolation techniques are used to estimate missing values where appropriate, and robust analytical models are employed that can tolerate incomplete datasets. Additionally, missing data points are flagged to prevent misleading conclusions during the analysis process.
+    
+    \item \textbf{Legitimate Periodic Traffic:} Many legitimate applications (e.g., software updates, email polling) exhibit periodic behavior that resembles beaconing. To distinguish between benign and malicious periodic traffic, a combination of behavioral profiling, anomaly detection, and contextual analysis is used. This involves analyzing additional metadata such as destination IP addresses, communication frequency, and protocol usage to identify deviations from expected legitimate behavior.
 \end{itemize}
 
+By implementing these strategies, the impact of real-world data challenges is minimized, leading to more reliable and accurate results in beaconing detection.
+
 \section{Artificial Data Source}
 In addition to analyzing real-world network traffic, the BAYWATCH framework was evaluated using artificial data to test its robustness and accuracy under controlled conditions. The artificial data was designed to simulate various types of beaconing behavior, including different periodicities, noise levels, and evasion techniques commonly employed by malware authors. A key feature of the artificial data is the introduction of jitter, which simulates random variations in the timing of beaconing events. This section describes the process of generating the artificial data, the specific jitter ranges used, and the structure of the data.
 
@@ -367,31 +370,47 @@ The artificial data was generated to mimic the structure of real-world network t
     \item \textbf{Is Artificial}: A tag (labeled as "yes") was added to distinguish the artificial data from real-world data. This tag ensures that the artificial data can be easily identified and separated during analysis.
 \end{itemize}
 
-\subsection{Jitter Ranges}
-Jitter is a parameter in simulating real-world beaconing behavior, as it introduces randomness into the timing of beaconing events. To evaluate the robustness of the BAYWATCH framework, the following jitter ranges were used:
+\subsection{Jitter and Beacon Frequency Variations}
+
+In simulating real-world beaconing behavior, two critical parameters are varied: beacon frequency and jitter. Beacon frequency refers to the regular interval at which a beacon signal is transmitted, while jitter introduces randomness into these intervals to mimic natural network variations or deliberate obfuscation tactics.
+
+\textbf{Beacon Frequency Intervals:}
+
+The following beacon frequency intervals were utilized in the simulations:
 
 \[
-\text{Jitter ranges: } [2, 5, 10, 30, 60] \text{ seconds}
+\text{Intervals: } [10, 20, 30, 40, 50, 60, 120, 300] \text{ seconds}
 \]
 
-Each jitter range represents a different level of perturbation in the beaconing behavior:
-\begin{itemize}
-    \item \textbf{2 seconds}: Minimal jitter, simulating near-ideal conditions with very little variation in beacon timing.
-    \item \textbf{5 seconds}: Low jitter, simulating slight variations in beacon timing due to minor network delays.
-    \item \textbf{10 seconds}: Moderate jitter, simulating more noticeable variations in beacon timing.
-    \item \textbf{30 seconds}: High jitter, simulating significant variations in beacon timing, potentially due to network congestion or intentional evasion techniques.
-    \item \textbf{60 seconds}: Very high jitter, simulating extreme variations in beacon timing, which may occur in highly unstable network conditions.
-\end{itemize}
+Each interval represents a different rate of beacon transmission, ranging from high-frequency (every 10 seconds) to low-frequency (every 5 minutes). This variation allows for the assessment of the detection framework's sensitivity across a spectrum of beaconing behaviors.
+
+\textbf{Jitter Ranges:}
+
+To introduce variability into the beaconing intervals, the following jitter ranges were applied:
+
+\[
+\text{Jitter ranges: } [2, 5, 10, 120, 150] \text{ seconds}
+\]
+
+These jitter values add a random variation to each beacon interval, simulating conditions from minimal timing fluctuations to substantial deviations. For instance, a jitter of 2 seconds introduces slight randomness, while a jitter of 150 seconds represents significant variability, which may cause the beaconing behavior to be undetectable. By systematically varying both beacon frequency and jitter, the robustness of the BAYWATCH framework is evaluated under diverse conditions, ensuring its effectiveness in detecting beaconing behaviors across a range of real-world scenarios.
+
+\textbf{Combined Impact of Frequency and Jitter:}
+
+The interplay between beacon frequency and jitter is important. By systematically varying both beacon frequency and jitter, the robustness of the BAYWATCH framework is evaluated under diverse conditions, ensuring its effectiveness in detecting beaconing behaviors across a range of real-world scenarios. It will be presented in the implementation chapter.
 
 \subsection{Scenarios Tested}
-The artificial data was used to test the BAYWATCH framework under the following scenarios:
 
-\begin{itemize}
-    \item \textbf{Low Jitter}: Beaconing behavior with minimal jitter (e.g., 2 seconds). This scenario tests the framework's ability to detect periodic behavior in near-ideal conditions.
-    \item \textbf{Moderate Jitter}: Beaconing behavior with moderate jitter (e.g., 10 seconds). This scenario tests the framework's robustness to typical real-world perturbations.
-    \item \textbf{High Jitter}: Beaconing behavior with significant jitter (e.g., 30 seconds). This scenario tests the framework's ability to handle more extreme variations in beacon timing.
-    \item \textbf{Very High Jitter}: Beaconing behavior with very high jitter (e.g., 60 seconds). This scenario tests the framework's performance under highly unstable network conditions.
-\end{itemize}
+To evaluate the robustness and effectiveness of the BAYWATCH framework, a comprehensive set of scenarios was designed by varying both beacon frequency intervals and jitter ranges. This dual-parameter variation ensures that the framework is tested under conditions that closely mimic real-world network behaviors. By systematically combining the beacon frequency intervals with the jitter ranges, multiple scenarios were created to test the BAYWATCH framework. For instance:
+
+\textbf{High-Frequency, Low-Jitter Scenario}: Beacon interval of 10 seconds with a jitter of 2 seconds.
+
+\textbf{Moderate-Frequency, Moderate-Jitter Scenario}: Beacon interval of 60 seconds with a jitter of 10 seconds.
+
+\textbf{Low-Frequency, High-Jitter Scenario}: Beacon interval of 300 seconds with a jitter of 150 seconds(half of the interval).
+
+Each combination presents unique challenges, allowing for a thorough assessment of the framework's capability to detect beaconing behavior under varying conditions.
+
+Different malware or legitimate applications may exhibit varying beacon frequencies. Testing across a spectrum of intervals ensures that the framework can accurately detect both rapid and infrequent beaconing behaviors. Real-world network conditions introduce randomness in communication timings. By incorporating different jitter levels, the framework's resilience to timing variations and its ability to distinguish between regular and irregular patterns are assessed. Through this multifaceted testing approach, the BAYWATCH framework's robustness and adaptability to diverse network behaviors are thoroughly evaluated.
 
 \subsection{Integration with Real-World Data}
 The artificial data was used in conjunction with real-world network traffic to provide a comprehensive evaluation of the BAYWATCH framework. While the real-world data provides insights into the framework's performance in a production environment, the artificial data allows for controlled testing of specific scenarios and edge cases. The is\_Artificial tag ensures that the artificial data can be easily distinguished from real-world data during analysis. This combination ensures that the framework is both robust to real-world perturbations and accurate in detecting malicious beaconing behavior.
@@ -469,7 +488,7 @@ Understanding the interaction patterns of hosts within the network is for identi
     \label{fig:ip}
 \end{figure}
 
-Figure \ref{fig:ip} illustrates the distribution of hosts (IP addresses) based on the number of unique URLs they contacted. The X-axis represents the number of unique URLs, ranging from 1 to 10, while the Y-axis shows the count of hosts in each category. The visualization reveals that the majority of hosts interact with only a small number of unique URLs. Specifically, approximately 17,500 hosts contacted exactly two unique URLs, while around 15,000 hosts contacted only one unique URL. As the number of unique URLs increases, the number of hosts decreases significantly. However, there are still many URLs that are connected to other hosts.
+Figure \ref{fig:ip} illustrates the distribution of hosts (IP addresses) based on the number of unique URLs they contacted. The X-axis represents the number of unique URLs, ranging from 1 to 15, while the Y-axis shows the count of hosts in each category. The visualization reveals that the majority of hosts interact with only a small number of unique URLs. Specifically, approximately 17,500 hosts contacted exactly two unique URLs, while around 15,000 hosts contacted only one unique URL. As the number of unique URLs increases, the number of hosts decreases significantly. However, there are still many URLs that are connected to other hosts.
 
 This pattern suggests that network activity is highly concentrated on a small set of destinations, with most hosts accessing only a few key resources. For example, hosts that contact only one or two unique URLs are likely accessing essential services, such as internal tools, authentication servers, or frequently used websites. In contrast, hosts that contact a larger number of unique URLs may represent more diverse or specialized activity, such as administrators, developers, or automated systems performing a wide range of tasks. This rationale underscores the importance of using whitelists to filter out known legitimate traffic, allowing the focus to be on identifying truly suspicious activities.