我一直试图通过一本书来教自己学习机器学习,这是我第一次尝试“脱离常规”算法。准备数据之后,我使用了导入的split函数,然后尝试做出一些预测。但是,即使手动验证每个功能的#数量相同,我也会收到错误说明:
Traceback (most recent call last):
File "main.py", line 89, in <module>
xTrain, xTest, yTrain, yTest = tts(new_data, netGood, random_state=0)
File "/home/runner/.local/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 2096, in train_test_split
arrays = indexable(*arrays)
File "/home/runner/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 230, in indexable
check_consistent_length(*result)
File "/home/runner/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 205, in check_consistent_length
" samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [4, 103]
问题是我使用print语句来验证每个功能都准确地包含103个条目,所以我不知道为什么错误认为功能不正确。任何帮助,将不胜感激。如果我在有人回答之前解决了问题,我会更新答案。
from yahoo_historical import Fetcher
import pandas as pd
from IPython.display import display
data_Range = Fetcher("AAPL", [2019, 1, 1], [2019, 6, 1])
data = data_Range.getHistorical()
slopes = []
volumes = data['Volume'][1:]
highes = data['High']
for index in range(len(highes) - 1):
slopes.append(highes[index + 1] - highes[index])
rLocale = []
for index in range(len(slopes)):
#need to implement base cases
if index is 0:
if slopes[index] > slopes[index + 1]:
rLocale.append(1)
else:
rLocale.append(-1)
elif index is len(slopes) - 1:
if slopes[index] > slopes[index - 1]:
rLocale.append(1)
else:
rLocale.append(-1)
else:
behind = slopes[index - 1]
current = slopes[index]
infront = slopes[index + 1]
if current > behind and current > infront:
rLocale.append(1)
if (current > behind and current < infront) or (current < behind and current > infront):
rLocale.append(0)
if current < behind and current < infront:
rLocale.append(-1)
netGood = []
for index in range(1, len(highes)):
if highes[index] >= highes[index - 1]:
netGood.append(1)
else:
netGood.append(-1)
highes = highes[:-1]
new_data = [slopes, rLocale, highes, volumes]
print(len(new_data[0]))
print(len(new_data[1]))
print(len(new_data[2]))
print(len(new_data[3]))
print(len(netGood))
print('---------------------------')
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=3)
from sklearn.model_selection import train_test_split as tts
xTrain, xTest, yTrain, yTest = tts(new_data, netGood, random_state=0)
clf.fit(new_data, netGood)
print(clf.predict(new_data))
控制台日志:
103
103
103
103
103
---------------------------
答案 0 :(得分:1)
您需要拥有new_data
,它是一系列观察值。现在,您具有一系列功能。只需换位就可以解决此问题:
import numpy as np
new_data = np.transpose(new_data)