在python中划分训练数据和测试数据中的样本

时间:2017-01-26 16:17:50

标签: python-3.x

我是编程的新手,我正在解决python中的机器学习问题,我试图将我的数据集拆分为训练和测试,因为代码显示,我有以下错误,即使在谷歌搜索一些我无法克服和其他网站:

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
#Load up the training dataset
df = pd.read_excel('Trainind data_2002.xls')
df.head()
df['training'] = np.random.uniform(0, 1, len(df)) <= .70
colsfeatures = ['c2', 'c3', 'c4', 'c5', 'c7', 'ndvi', 'vi7']
colclass = ['class']
train, test = df[df['training'] == True, df['training'] == False]
trainingMatrix = train.as_matrix(colsfeatures)
classMatrix = train.as_matrix(colclass)

rfc = RandomForestClassifier(n_estimators=100, n_jobs=2)
rfc.fit(traningMatrix, classMatrix)
testMatrix = test.as_matrix(colsfeatures)
result = rfc.predict(testMatrix)
test['predictions'] = result
test.head()

错误: TypeError:&#39;系列&#39;对象是可变的,因此它们不能被散列

拜托,谁能帮助我,我将不胜感激。

1 个答案:

答案 0 :(得分:0)

你试过train_test_split吗?

from sklearn.model_selection import train_test_split
train , test = train_test_split(<<your data set >> , test_size = << ex : 0.2>>)