我写了一些代码来练习机器学习。但是我有这个问题,我不明白,因为我输入的是quandl表中的各列。
这是我的代码:
import pandas as pd
import math
import quandl
import numpy as np
from sklearn import preprocessing, svm, model_selection #preproceesing is used to do some cleaning or scalin of data prior to machine learning
from sklearn.model_selection import train_test_split, cross_validate
from sklearn.linear_model import LinearRegression
df=quandl.get("EOD/NKE", authtoken="jcfsm6-47Pe1hgxDqjDU")
df=df[['ADJ_OPEN','ADJ_HIGH','ADJ_LOW','ADJ_CLOSE','ADJ_VOLUME']]
df['HL_PCT']=(df['ADJ_HIGH'] -df['ADJ_LOW'])/ df['ADJ_CLOSE']*100.0
df['PCT_Change']=(df['ADJ_CLOSE']-df['ADJ_OPEN'])/df['ADJ_OPEN']*100.0
df=df[['ADJ_CLOSE','HL_PCT','PCT_Change','ADJ_VOLUME']]
print(df.head())
forecast_col='ADJ_CLOSE'
df.fillna(value=-99999, inplace=True)
forecast_out=int(math.ceil(0.01*len(df)))
df['label']=df[forecast_col].shift(-forecast_out)
df.dropna(inplace=True) #NaN in short term is Not a Number
#In typical standard in machine learning, X is used to name the features, and y is used to name the label.
X=np.array(df.drop(['label'],1))
y=np.array(df['label'])
X=preprocessing.scale(X)
y=np.array(df['label'])
#When training, take around 75% of your data to train, adn 25% to let the module predict.
X_train, y_train, X_test, y_test=train_test_split(X,y,test_size=0.2)
# Define the classifier
clf=svm.SVR(gamma='auto')
# Train the model
clf.fit(X_train, y_train)
# Test the model
confidence=clf.score(X_test, y_test)
print(confidence)
当我使用命令python3 my.py
运行它时,这是错误消息:
KeyError: "None of [Index(['ADJ_OPEN', 'ADJ_HIGH', 'ADJ_LOW', 'ADJ_CLOSE', 'ADJ_VOLUME'], dtype='object')] are in the [columns]"