No announcement yet.

Signal to Noise Ratio in Synthetic Data Set

  • Filter
  • Time
  • Show
Clear All
new posts

  • Signal to Noise Ratio in Synthetic Data Set

    Hey guys, new to the boards.

    I have created a naive bayes classifier that trains on synthetic data i.e data generated in matlab with an empirically derived distribution. The aim of my project is to eventually validated the classifier against a real data set, but before I do that I need to validate it against synthetic data. I am currently using an 80/20 split. Meaning I generate some data set, train the classifier on 80% and then validate it's performance on the additional 20%.
    The outcome in this instance is dichotomous i.e disease Vs. no disease. The synthetic data will be used to mimic Nerve Conduction Velocity tests such as SNAP's, Motor velocities, latencies etc. of the distal median and ulnar nerves .

    In my quest to create a synthetic data set which accurately mimics a real phenomena I must introduce noise into the data set. Here in lies my dilemma. What level of noise is appropriate to mimic a nerve conduction velocity test? The only solution I have come up with is running the simulation multiple times with varying degrees of noise added in and then indicating at what point the signal to noise level is high enough to achieve acceptable sensitivity and specificity levels in the classifier.

    FYI noise in my code is in Decibels so it is just a log10 transformation, so an arbitrary signal to noise ratio of 100/1 would be 2dB.
    Any help would be greatly appreciated.