我尝试运行以下代码。顺便说一句,我是python和sklearn的新手。
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
# data import and preparation
trainData = pd.read_csv('train.csv')
train = trainData.values
testData = pd.read_csv('test.csv')
test = testData.values
X = np.c_[train[:, 0], train[:, 2], train[:, 6:7], train[:, 9]]
X = np.nan_to_num(X)
y = train[:, 1]
Xtest = np.c_[test[:, 0:1], test[:, 5:6], test[:, 8]]
Xtest = np.nan_to_num(Xtest)
# model
lr = LogisticRegression()
lr.fit(X, y)
其中y是0和1的np.ndarray
我收到以下内容:
来自sklearn文档的文件“C:\ Anaconda3 \ lib \ site-packages \ sklearn \ linear_model \ logistic.py”,line> 1174,in fit check_classification_targets(y)的
check_classification_targets中的文件“C:\ Anaconda3 \ lib \ site-packages \ sklearn \ utils \ multiclass.py”,第172行,> 引发ValueError(“未知标签类型:%r”%y_type)
ValueError:未知标签类型:'未知'
y:类似数组,形状(n_samples,) 目标值(分类中的类别标签,回归中的实数)
我的错误是什么?
UPD:
y是数组([0.0,1.0,1.0,...,0.0,1.0,0.0],dtype = object)大小是(891,)
答案 0 :(得分:53)
您的[WebMethod(Description = "This method is used to validate email")]
public bool ValidateEmail(string email)
{
bool isValid = false;
try
{
string[] host = (email.Split('@'));
string hostname = host[1];
IPHostEntry IPhost = Dns.GetHostByName(hostname);
IPEndPoint endPt = new IPEndPoint(IPhost.AddressList[0], 25);
Socket soc = new Socket(endPt.AddressFamily, SocketType.Stream, ProtocolType.Tcp);
soc.Connect(endPt); //open connection to host
soc.Close();
isValid = true;
}
catch (Exception ex)
{
//ex.Message.ToString();
isValid = false;
}
return isValid = true;
}
类型为y
,因此sklearn无法识别其类型。在第object
行后面添加第y=y.astype('int')
行。
答案 1 :(得分:0)
除了Miriam,我也遇到类似的错误,但在我的情况下,y_pred的各个元素的类型为'np.int32'
,而y的各个元素的类型为'int'
。
我通过以下方法解决了这个问题:
for i,x in enumerate(y_pred):
y_pred[i]=x.astype('int')