| ... | ... | @@ -159,10 +159,10 @@ Volume data is particularly important in financial markets as it reflects the tr |
|
|
|
|
|
|
|
{width=882 height=137}
|
|
|
|
|
|
|
|
### Ensure no negative values
|
|
|
|
#### Ensure no negative values
|
|
|
|
{width=429 height=31}
|
|
|
|
Purpose:
|
|
|
|
Base Volume with Noise: Volume values are generated from a normal distribution with a mean and standard deviation calculated from the original dataset. The scale is set to 20% of the original standard deviation to create realistic variations.
|
|
|
|
|
|
|
|
Purpose:Base Volume with Noise: Volume values are generated from a normal distribution with a mean and standard deviation calculated from the original dataset. The scale is set to 20% of the original standard deviation to create realistic variations.
|
|
|
|
Spikes: About 5% of the synthetic volume data is randomly selected to have artificially high values (spikes), simulating sudden surges in market activity (e.g., after news releases or major events). This introduces further variation and realism in the data.
|
|
|
|
Step 5: Handle Missing or Infinite Values
|
|
|
|
After generating the synthetic data, any missing or invalid values (such as NaN or inf) are replaced with the respective feature's mean from the original dataset to ensure data integrity.
|
| ... | ... | @@ -184,6 +184,5 @@ Finally, the synthetic data is concatenated with the original dataset to create |
|
|
|
{width=493 height=178}
|
|
|
|
|
|
|
|
Purpose: This step combines the original data and synthetic data, creating a larger, augmented dataset that will be used for model training.
|
|
|
|
Conclusion
|
|
|
|
The synthetic data generation process introduces additional rows into the dataset, enriching the feature space and providing the model with more examples to learn from. By carefully generating realistic noise for each feature and simulating market behavior through volume spikes, this approach helps the model generalize better to unseen data. However, it’s crucial to ensure that the synthetic data aligns well with the real-world data distribution and is handled correctly during training to avoid introducing biases or overfitting.
|
|
|
|
|
|
|
|
#### Conclusion
|
|
|
|
The synthetic data generation process introduces additional rows into the dataset, enriching the feature space and providing the model with more examples to learn from. By carefully generating realistic noise for each feature and simulating market behavior through volume spikes, this approach helps the model generalize better to unseen data. However, it’s crucial to ensure that the synthetic data aligns well with the real-world data distribution.However,after running multiple prediction models it was found that the real dataset were performing better.The detail analysis of the prediction models can be found on model selections.Please Take a look |
|
|
\ No newline at end of file |