Python IndexError:只有整数

时间:2016-12-01 02:46:19

标签: python pandas numpy scikit-learn

我是sklearn的新手,无法格式化数据以预测和评估混淆矩阵。我正在使用此Random Forest tutorial

这是我的代码

from sklearn.ensemble import RandomForestClassifier
import numpy as np
import pandas as pd

dataframe = pd.read_csv('output.txt', sep='\t')
df = pd.DataFrame(dataframe)
df['is_train'] = np.random.uniform(0, 1, len(df)) <= .75
train, test = df[df['is_train']==True], df[df['is_train']==False]
features = df.columns[1:5]
clf = RandomForestClassifier(n_jobs=2)
y, _ = pd.factorize(train['event_count'])
clf.fit(train[features], y)

我预测的这一行给出了错误:

preds = df['event_count'][clf.predict(test[features])]
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

1 个答案:

答案 0 :(得分:0)

问题看起来就是这个df[:6]。这会让你的每一行都返回到6,而不是列。