Now we can search our documents for investment suggestions! We want to take some caution here, considering the inclusion or exclusion of N/A data from both the training data and the current data. We also might as well train against all of our historical data.
Depending on when you are doing this tutorial, you may find many suggestions, few suggestions, or no suggestions! What you can do is refine your standards to either look for a larger pool, or smaller pool. You can do this easily by increasing your standards and looking for companies similar to companies that out-performed the market by, say, 10%, to get a smaller pool of companies.
You could also decrease your standards as well, to get a good handful of companies to invest in.
Consider, for example, the S&P 500 of companies. It may vary slightly, but, chances are, 250 companies will "out perform" the market and 250 companies will "under perform." The S&P 500 index is basically an average of all of the companies, so this stands to reason that ~250 companies will outperform. Thus, you could make it your objective to pick only a small handful of hopefully significant out-performing companies, or you could actually widen your net in attempt to pick ~250 companies that will all out-perform.
Selecting fewer companies is going to follow more of a higher risk higher yield strategy. Selecting 250 companies will have lower risk and likely lower reward. People often think of high risk high yield as preferrable, since it "makes more money," but they are ignoring the "high risk" part. It is rarely the case that high risk high yield pays out in the long term compared to lower risk, yet it carries more risk.
import numpy as np import matplotlib.pyplot as plt from sklearn import svm, preprocessing import pandas as pd from matplotlib import style import statistics style.use("ggplot") FEATURES = ['DE Ratio', 'Trailing P/E', 'Price/Sales', 'Price/Book', 'Profit Margin', 'Operating Margin', 'Return on Assets', 'Return on Equity', 'Revenue Per Share', 'Market Cap', 'Enterprise Value', 'Forward P/E', 'PEG Ratio', 'Enterprise Value/Revenue', 'Enterprise Value/EBITDA', 'Revenue', 'Gross Profit', 'EBITDA', 'Net Income Avl to Common ', 'Diluted EPS', 'Earnings Growth', 'Revenue Growth', 'Total Cash', 'Total Cash Per Share', 'Total Debt', 'Current Ratio', 'Book Value Per Share', 'Cash Flow', 'Beta', 'Held by Insiders', 'Held by Institutions', 'Shares Short (as of', 'Short Ratio', 'Short % of Float', 'Shares Short (prior '] def Build_Data_Set(): data_df = pd.DataFrame.from_csv("key_stats_acc_perf_WITH_NA.csv") #data_df = data_df[:100] data_df = data_df.reindex(np.random.permutation(data_df.index)) data_df = data_df.replace("NaN",0).replace("N/A",0) X = np.array(data_df[FEATURES].values)#.tolist()) y = (data_df["Status"] .replace("underperform",0) .replace("outperform",1) .values.tolist()) X = preprocessing.scale(X) Z = np.array(data_df[["stock_p_change","sp500_p_change"]]) return X,y,Z def Analysis(): test_size = 1 invest_amount = 10000 total_invests = 0) if_market = 0 if_strat = 0 X, y, Z = Build_Data_Set() print(len(X)) clf = svm.SVC(kernel="linear", C= 1.0) clf.fit(X[:-test_size],y[:-test_size]) correct_count = 0 for x in range(1, test_size+1): if clf.predict(X[-x])[0] == y[-x]: correct_count += 1 if clf.predict(X[-x])[0] == 1: invest_return = invest_amount + (invest_amount * (Z[-x][0]/100)) market_return = invest_amount + (invest_amount * (Z[-x][1]/100)) total_invests += 1 if_market += market_return if_strat += invest_return data_df = pd.DataFrame.from_csv("forward_sample_WITH_NA.csv") data_df = data_df.replace("N/A",0).replace("NaN",0) X = np.array(data_df[FEATURES].values) X = preprocessing.scale(X) Z = data_df["Ticker"].values.tolist() invest_list = [] for i in range(len(X)): p = clf.predict(X[i])[0] if p == 1: print(Z[i]) invest_list.append(Z[i]) print(len(invest_list)) print(invest_list) Analysis()