# MasterThesis Folder Preprocessing: - Shows the steps of how the preprocessing of the real data is done - the 10 downlaoded datasets of CSE-CIC-IDS2018 are loaded, then combined - unnecessary columns are dropped - preprocessing steps are carried out - droping nan, inf and -inf, negative values - changing time to UNIX format - aggregating and saving the classes back into different datasets - "BruteForce": ["FTP-BruteForce", "SSH-Bruteforce", "Brute Force -Web", "Brute Force -XSS"], - "DoS": ["DoS attacks-GoldenEye", "DoS attacks-Slowloris", "DoS attacks-Hulk", "DoS attacks-SlowHTTPTest", "DDoS attacks-LOIC-HTTP", "DDOS attack-HOIC", "DDOS attack-LOIC-UDP"], - "Infiltration": ["Infilteration"], - "Bot": ["Bot"], - "Benign": ["Benign"] Folder Models - contains 4 notebooks - SDV (GitHub repo name) notebook has CTGAN, CopulaGAN, and TVAE models more info at: https://docs.sdv.dev/sdv/single-table-data/modeling/synthesizers - Synthcity (GitHub repo name) notebook contains RTAVE and ADSGAN models more info at: https://github.com/vanderschaarlab/synthcity - TabFairGAN notebook has TabFairGAN models more info at: https://github.com/amirarsalan90/TabFairGAN - Combining_everthing contains the code of how synthetically generated datasets are combined into 1 dataset Folder Classifiers - contains the code for both Random Forest and XGBoost classifiers and the proprocessing steps - Each cell in the notebook is explained separately. - sdmetric (used for evaluation) can be found at: https://docs.sdv.dev/sdmetrics/ - table-evaluator (used for the evaluation) can be found at: both are found at: https://github.com/Baukebrenninkmeijer/table-evaluator