Classification fit get ValueError: setting an array element with a sequence

时间:2015-05-24 21:02:50

标签: python machine-learning

I want to predict if user click on link or not. I use logistic regression. I have got a lot of data for start. But on 23 examples i didn't get this exception. If i try 3mio data the i get this exception

The following is my code, adapted from the example on the scikit-learn website:

data = [line.strip() for line in open('dataforSVM.txt')]
pod=[];
listData=[];
y=[];
for i in range(0,len(data)):
    splitData=data[i].split(',' );
    tempPod=[];
    for j in range(0,len(splitData)-1):
        if isFloat(splitData[j]):
            tempPod.append(float(splitData[j]));
    y.append(float(splitData[j]));
    pod.append(tempPod)

X=pod;
Y=y;
h = .02  # step size in the mesh

logreg = linear_model.LogisticRegression(C=1.0, class_weight='auto', dual=False, fit_intercept=True,
          intercept_scaling=1, penalty='l2', random_state=None, tol=0.0001)

Z=logreg.predict(X)
print Z

acc = accuracy_score(Y, Z)
print acc

I get error:

Traceback (most recent call last):
  File "D:/Users/jures/Desktop/logisticRegression.py", line 45, in <module>
    logreg.fit(X, Y)
  File "C:\Python27\lib\site-packages\sklearn\svm\base.py", line 668, in fit
    X = atleast2d_or_csr(X, dtype=np.float64, order="C")
  File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 134, in atleast2d_or_csr
    "tocsr", force_all_finite)
  File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 111, in _atleast2d_or_sparse
    force_all_finite=force_all_finite)
  File "C:\Python27\lib\site-packages\sklearn\utils\validation.py", line 91, in array2d
    X_2d = np.asarray(np.atleast_2d(X), dtype=dtype, order=order)
  File "C:\Python27\lib\site-packages\numpy\core\numeric.py", line 320, in asarray
    return array(a, dtype, copy=False, order=order)
ValueError: setting an array element with a sequence.

1 个答案:

答案 0 :(得分:1)

Your problem can be reproduced by using the following content for your data file:

1,1,0
A,3,1
5,5,0

Because of the if isFloat(splitData[j]) you ignore some values of your data for X. Therefore you end up with a 2D array pod in which some rows have less entries than others, resulting in an error. You should clean up your data and then get rid of that if.

Furthermore your y seems wrong to me. By using y.append(float(splitData[j])); you will use the last value of your for loop as j. But you don't stop that for loop at the last element of the row, but instead at the second to last element. So the last element in each of your data rows (which is usually the label) will be discarded. You probably want j+1 there.