# MasterThesis

Folder Preprocessing:
 - Shows the steps of how the preprocessing of the real data is done 
	- the 10 downlaoded datasets of CSE-CIC-IDS2018 are loaded, then combined 
	- unnecessary columns are dropped
	- preprocessing steps are carried out
		- droping nan, inf and -inf, negative values
		- changing time to UNIX format
	- aggregating and saving the classes back into different datasets
		- "BruteForce": ["FTP-BruteForce", "SSH-Bruteforce", "Brute Force -Web", "Brute Force -XSS"],
		- "DoS": ["DoS attacks-GoldenEye", "DoS attacks-Slowloris", "DoS attacks-Hulk", "DoS attacks-SlowHTTPTest", "DDoS attacks-LOIC-HTTP", "DDOS attack-HOIC", "DDOS attack-LOIC-UDP"],
		- "Infiltration": ["Infilteration"],
		- "Bot": ["Bot"],
		- "Benign": ["Benign"]
 
 
 
 Folder Models
 - contains 4 notebooks
	- SDV (GitHub repo name) notebook has CTGAN, CopulaGAN, and TVAE models more info at: https://docs.sdv.dev/sdv/single-table-data/modeling/synthesizers
	- Synthcity (GitHub repo name) notebook contains RTAVE and ADSGAN models more info at: https://github.com/vanderschaarlab/synthcity
	- TabFairGAN notebook has TabFairGAN models more info at: https://github.com/amirarsalan90/TabFairGAN
	- Combining_everthing contains the code of how synthetically generated datasets are combined into 1 dataset
 
 
 Folder Classifiers
 - contains the code for both Random Forest and XGBoost classifiers and the proprocessing steps
 
 
- Each cell in the notebook is explained separately. 
- sdmetric (used for evaluation) can be found at: https://docs.sdv.dev/sdmetrics/
- table-evaluator (used for the evaluation) can be found at: both are found at: https://github.com/Baukebrenninkmeijer/table-evaluator