Question on Python Programming for Finance Part 12 (Machine Learning
I'm not clear on what you are explaining with respect to model accuracy. Was hoping you could clarify a bit? I've listened to both the tutorial chapter (several times) and read your accompanying text.
You have:
[Code]
do_ml('XOM') do_ml('AAPL') do_ml('ABT')
[/Code]
[Code]
Output:
Data spread: Counter({'1': 1713, '-1': 1456, '0': 1108}) accuracy: 0.375700934579 predicted class counts: Counter({0: 404, -1: 393, 1: 273})
Data spread: Counter({'1': 2098, '-1': 1830, '0': 349}) accuracy: 0.4 predicted class counts: Counter({-1: 644, 1: 339, 0: 87})
Data spread: Counter({'1': 1690, '-1': 1483, '0': 1104}) accuracy: 0.33738317757 predicted class counts: Counter({-1: 383, 0: 372, 1: 315}) So all of these are better than 33%, but the training data wasn't perfectly balanced either. For example, we can look at the first one:
Data spread: Counter({'1': 1713, '-1': 1456, '0': 1108}) accuracy: 0.375700934579 predicted class counts: Counter({0: 404, -1: 393, 1: 273})
[/Code]
"In this case, what if the model ONLY predicted "buy?" That would have been 1,713 correct / 4,277, which is actually a better score than we got. What about the other two? The second one, AAPL, is 49% accurate if it just predicts buy, at least on the training data. ABT is 37% accurate if it just does buy on the training data.
So, while we're doing better than 33%, it's currently unclear if this model is better than just saying "buy" on everything. In actual trading, this all can change. This model is being penalized, for example, if it says something is a buy, expecting a 2% rise in 7 days, but that 2% rise doesn't happen until 8 days, and yet, the algorithm calls it either a buy or hold along the way. In actual trading, this would still be fine. The same is true if this model turned out to be highly accurate. Actually trading a model can be a completely different thing entirely."
Questions:
1. Are you saying in regard to XOM that if we didn't use a model and just said "Buy" all the time, we'd be right about 40% of the time (according to the training data) so the model with predicted accuracy of 40% isn't really a good model because we could just say "Buy" and we'd do just as well? I'm not really clear on what you are explaining here.
2. I understand why you go into more accurately trying to balance the classes in the video. However, from what I understand about imbalanced classes and training classifiers, you don't usually run into issues unless you have severe class imbalance (like 80% in one class vs 20% in another). These classes don't seem that unbalanced. Are you just being super vigilent here to see if slightly imbalanced classes in the training set is influencing the outcome of BAC in the video?
3. Is the uptake after 12 lessons that we have a potentially good classifier? The high .30s / low .40s seem to suggest that we do better than random guessing (assuming we have a roughly - but not precisely - a 1/3, 1/3, 1/3 split in the training data). I understand that regardless of the accuracy of the classifier it is necessary to rigoously backtest a trading strategy (as we will no doubt discuss in subsquent chapters). However, based on questions 1 and 2, I'm not clear on your view of the quality of the classifier designed in chapters 1 through 12. Could you clarify?
Thanks in advance for your clarification and for all the great lessions!
You must be logged in to post. Please login or register an account.
1. Yes, your classifier should be the accuracy of a classifier that simply predicts the most common occurrence, otherwise you're not actually better than average in reality.
2. After we do tweaking, the classes aren't too imbalanced, no, but we purposefully tweaked parameters to make this so. Anything more than 60/40 in my experience can cause trouble, but not always. This is just something you need to check for and be cognizant of.
3. It's an okay start, and can easily be better than random. A slight edge can make you very wealthy, just ask a casino. The problem with trading is there's usually a lot of friction from a trade, be it latency, trade fees...etc.
The main issue with trading, and anything to do with statistics, is you generally have no idea how wrong you are, and just simply de facto don't know what you don't know. It's really easy to come up with a model that looks great on paper based on the metrics you are looking for, but still is filled with statistical fallacies and issues that will sink you in reality.
-Harrison 8 years ago
You must be logged in to post. Please login or register an account.