读取.arff文件并尝试忽略标题

时间:2019-01-20 00:08:11

标签: python machine-learning scipy classification arff

我是python的新手,我的代码需要一些帮助。我正在读一本。我的Jupyter笔记本使用pyhton2.7编写了arff文件。我想知道我需要在arff.lodarff中输入哪个参数,或者执行另一种方法,因此可以忽略数据头。

rain,meta = arff.loadarff(open('train.arff', 'r'))

读取文件后,我正在做一些数学运算,但出现此错误。

我希望有人能帮助我找出答案。

train,meta = arff.loadarff(open('train.arff', 'r'))
train = pd.DataFrame(train)
print(train)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-192-3b2868d1fd43> in <module>()
----> 1 ne = getNeighbors(X_train, y_train, X_test, k = 3)
      2 print(ne)

<ipython-input-191-75b4da86d04e> in getNeighbors(X_train, y_train, X_test, k)
      6             for (trainpoint,y_train_label) in zip(X_train,y_train):
      7                 # calculate the distance and append it to a distances_label with the associated label.
----> 8                 distances_label.append((distance(testpoint, trainpoint), y_train_label))
      9             k_neighbors_with_labels += [sorted(distances_label)[0:k]] # sort the distances and taken the first k neighbors
     10         return k_neighbors_with_labels

<ipython-input-186-22e861402349> in distance(testpoint, trainpoint)
      2 def distance(testpoint, trainpoint):
      3     # distance between testpoint and trainpoint.
----> 4     dist = np.sqrt(np.sum(np.power(float(testpoint)-float(trainpoint), 2)))
      5     return dis
      6 

ValueError: could not convert string to float: sepal_length

1 个答案:

答案 0 :(得分:0)

您假设testpoint是距离函数中的一个数组。

但是如果不是这样的话

您正在使用pandas数据框,它们不只是数组,这就是为什么要获取列名的原因。