ValueError:找到输入样本数量不一致的输入变量:[4,103]

时间:2019-06-05 01:13:18

标签: python scikit-learn

我一直试图通过一本书来教自己学习机器学习,这是我第一次尝试“脱离常规”算法。准备数据之后,我使用了导入的split函数,然后尝试做出一些预测。但是,即使手动验证每个功能的#数量相同,我也会收到错误说明:

Traceback (most recent call last):
  File "main.py", line 89, in <module>
    xTrain, xTest, yTrain, yTest = tts(new_data, netGood, random_state=0)
  File "/home/runner/.local/lib/python3.6/site-packages/sklearn/model_selection/_split.py", line 2096, in train_test_split
    arrays = indexable(*arrays)
  File "/home/runner/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 230, in indexable
    check_consistent_length(*result)
  File "/home/runner/.local/lib/python3.6/site-packages/sklearn/utils/validation.py", line 205, in check_consistent_length
    " samples: %r" % [int(l) for l in lengths])
ValueError: Found input variables with inconsistent numbers of samples: [4, 103]

问题是我使用print语句来验证每个功能都准确地包含103个条目,所以我不知道为什么错误认为功能不正确。任何帮助,将不胜感激。如果我在有人回答之前解决了问题,我会更新答案。

from yahoo_historical import Fetcher
import pandas as pd
from IPython.display import display

data_Range = Fetcher("AAPL", [2019, 1, 1], [2019, 6, 1])

data = data_Range.getHistorical()

slopes = []

volumes = data['Volume'][1:]
highes = data['High']

for index in range(len(highes) - 1):
  slopes.append(highes[index + 1] - highes[index])

rLocale = []

for index in range(len(slopes)):

  #need to implement base cases
  if index is 0:
    if slopes[index] > slopes[index + 1]:
      rLocale.append(1)
    else:
      rLocale.append(-1)

  elif index is len(slopes) - 1:
    if slopes[index] > slopes[index - 1]:
      rLocale.append(1)
    else:
      rLocale.append(-1)

  else:
    behind = slopes[index - 1]
    current = slopes[index]
    infront = slopes[index + 1]

    if current > behind and current > infront:
      rLocale.append(1)
    if (current > behind and current < infront) or (current < behind and current > infront):
      rLocale.append(0)
    if current < behind and current < infront:
      rLocale.append(-1)


netGood = []

for index in range(1, len(highes)):
  if highes[index] >= highes[index - 1]:
    netGood.append(1)
  else:
    netGood.append(-1)

highes = highes[:-1]

new_data = [slopes, rLocale, highes, volumes]
print(len(new_data[0]))
print(len(new_data[1]))
print(len(new_data[2]))
print(len(new_data[3]))
print(len(netGood))

print('---------------------------')

from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=3)

from sklearn.model_selection import train_test_split as tts
xTrain, xTest, yTrain, yTest = tts(new_data, netGood, random_state=0)

clf.fit(new_data, netGood)
print(clf.predict(new_data))

控制台日志:

103
103
103
103
103
---------------------------

1 个答案:

答案 0 :(得分:1)

您需要拥有new_data,它是一系列观察值。现在,您具有一系列功能。只需换位就可以解决此问题:

import numpy as np
new_data = np.transpose(new_data)