Using the NO_NA file, I'm getting a 98% accuracy, which seems odd given your circa-50% score. Can you guess as to what might be the problem, and why it is over-fitting?
I also notice this error:
UserWarning: Numerical issues were encountered when centering the data and might not be solved. Dataset may contain too large values. You may need to prescale your features.
I do, however, prescale the dataset using
preprocessing.sacle(X)
as you suggest.
Also, if you could elaborate a bit more on what exactly constitutes over-fitting in the context of this tutorial, I would appreciate it.
So, 3 questions in total:
1. What could cause a 98% accuracy? (supposedly, over-fitting) 2. What could cause the "Numerical Issues" error? 3. What exactly is over-fitting in this context?
Thanks,
-T
You must be logged in to post. Please login or register an account.
Are you training and testing accuracy on the same data? You need to train and test obviously on the same dataset, but not on the exact same data samples between the two.
-Harrison 7 years ago
You must be logged in to post. Please login or register an account.
I did.
Regardless, I'm now stuck on pt.22 - the Yahoo Finance website changed and I think they scramble the source code on purpose to avoid outside parsing. Trying to find a workaround now.
Thanks for everything man!
-t0mgs 7 years ago
You must be logged in to post. Please login or register an account.