Question

我是编程的新手，我正在解决python中的机器学习问题，我试图将我的数据集拆分为训练和测试，因为代码显示，我有以下错误，即使在谷歌搜索一些我无法克服和其他网站：

import numpy as np
import pandas as pd
from sklearn.ensemble import RandomForestClassifier
#Load up the training dataset
df = pd.read_excel('Trainind data_2002.xls')
df.head()
df['training'] = np.random.uniform(0, 1, len(df)) <= .70
colsfeatures = ['c2', 'c3', 'c4', 'c5', 'c7', 'ndvi', 'vi7']
colclass = ['class']
train, test = df[df['training'] == True, df['training'] == False]
trainingMatrix = train.as_matrix(colsfeatures)
classMatrix = train.as_matrix(colclass)

rfc = RandomForestClassifier(n_estimators=100, n_jobs=2)
rfc.fit(traningMatrix, classMatrix)
testMatrix = test.as_matrix(colsfeatures)
result = rfc.predict(testMatrix)
test['predictions'] = result
test.head()

错误： TypeError：＆＃39;系列＆＃39;对象是可变的，因此它们不能被散列

拜托，谁能帮助我，我将不胜感激。

Answer 1

你试过train_test_split吗？

from sklearn.model_selection import train_test_split
train , test = train_test_split(<<your data set >> , test_size = << ex : 0.2>>)

在python中划分训练数据和测试数据中的样本

1 个答案: