不同长度的决策树错误

时间:2017-01-22 10:25:38

标签: python

我尝试计算不同深度的决策树的测试和训练错误。

train_error = []
test_error = []    
for i in range (3,21):
    X_train, X_test, y_train, y_test = train_test_split(womendata, womeny, test_size=0.4, random_state=1 )
    decitiontree = tree.DecisionTreeClassifier(criterion='gini', splitter='best', max_depth=i, class_weight = 'balanced', min_samples_split=i)
    clf = decitiontree.fit(X_train, y_train)
    train_error.append( 1 -  clf.score(X_train, y_train)  )     
    test_error.append( 1 -  clf.score(X_test, y_test)  )

在python 3中我收到错误:

Traceback (most recent call last):
  File "<stdin>", line 4, in <module>
  File "/usr/local/lib/python3.4/dist-packages/sklearn/tree/tree.py", line 154, in fit
    X = check_array(X, dtype=DTYPE, accept_sparse="csc")
  File "/usr/local/lib/python3.4/dist-packages/sklearn/utils/validation.py", line 398, in check_array
    _assert_all_finite(array)
  File "/usr/local/lib/python3.4/dist-packages/sklearn/utils/validation.py", line 54, in _assert_all_finite
    " or a value too large for %r." % X.dtype)

ValueError: Input contains NaN, infinity or a value too large for dtype('float32').

womendata en women y的长度相同,并且集合中没有丢失的数据。

1 个答案:

答案 0 :(得分:0)

从您提供包含无效值的数据数组的错误。

  

ValueError:输入包含 NaN,无穷大或值太大   D型( 'FLOAT32')。

请检查您的数据是否有效意义:

  1. 在womendata或womeny上没有NaN值
  2. 没有关于womendata或womeny的Inf值
  3. 值在float32 min和float32 max
  4. 的范围内

    您可以使用以下代码:

    import numpy as np
    info = np.finfo(np.float64)
    
    for x in [womendata, womeny]:
        assert np.all(x <= info.max) and np.all(x >= info.min), 'not all values in range'
        assert np.all(x != np.inf) and np.all(x != -np.inf), 'data contains infinity value'
        assert np.all(x is not np.nan), 'data contains Nan value'