我正在使用Python和sk-learn进行我的第一个项目。 在项目中,我必须根据可用数据进行预测。 为此,我想使用DesicionTreeClassifier。 我确实加载并清理数据并生成一些树。 在生成期间,一些数据集确实无法生成树,而另一些数据集则无法生成。 随着我仔细观察,我发现可以训练树的数据集少于30行,每行有9列。似乎树不能超过4。
Traceback (most recent call last):
File "/usr/local/bin/decisionTree/readAnPrepareData.py", line 57, in <module>
trainForest()
File "/usr/local/bin/decisionTree/readAnPrepareData.py", line 39, in trainForest
model.fit(X_train, Y)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 524, in fit
X_argsorted=X_argsorted)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 340, in build
recursive_partition(X, X_argsorted, y, sample_mask, 0, -1, False)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 306, in recursive_partition
depth + 1, node_id, True)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 306, in recursive_partition
depth + 1, node_id, True)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 306, in recursive_partition
depth + 1, node_id, True)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 306, in recursive_partition
depth + 1, node_id, True)
File "/usr/lib/pymodules/python2.7/sklearn/tree/tree.py", line 272, in recursive_partition
min_samples_leaf, max_features, criterion, random_state)
File "_tree.pyx", line 533, in sklearn.tree._tree._find_best_split (sklearn/tree/_tree.c:4812)
ValueError: ndarray is not Fortran contiguous
我正在以这种方式创建树:
model = DecisionTreeClassifier()
model.fit(X_train, Y)
是什么导致这个?这可能是因为溢出?这将是一个非常奇怪的想法,因为这只是一个小数据......
Numpy正在运行版本:1.9.2 scikit-learn'0.16.1'