我正在尝试使用train_test_split评估我的模型。我定义了以下函数,根据函数中的输入在表(顶部列)上创建输出数组:
def top_sh(num):
###Get the top(num) in Shanghai data and arrange
####input and output variables accordingly
#Add column to be output value, either zero or one
#shanghai = shanghai_cp.copy()
if 'top' in shanghai.columns:
shanghai.drop(columns = shanghai.columns[-1],inplace = True)
shanghai['top'] = shanghai['world_rank'].apply(lambda x: 1 if x<= num else 0)
out = print('*****************'+ '\n' + 'Output array: Top'+ str(num)+ '\n' + 'Disregarding in Analysis: World rank')
#call = print(shanghai.head(15))
return out
然后,我将火车测试拆分的过程定义如下:
def train_test(df,size, seed):
###Split the data into test and train sets and test
#Get input output of df
if df == 'shanghai':
column1 = shanghai.columns[1:7]
Y = shanghai.values[: , -1].astype(int)
y = np.ravel(Y)
X = shanghai.values[: , 1:7]
elif df == 'times':
column1 = times.columns[1:10]
Y = times.values[: , -1].astype(int)
y = np.ravel(Y)
X = times.values[: , 1:10]
else:
return print('Available Datasets: "shanghai" , "times"')
#Split into train and test
X_Train, X_Test, Y_Train, Y_Test = train_test_split(X,Y, test_size=size, random_state=seed)
#Get the regression
model= LogisticRegression(solver='liblinear')
model.fit(X_Train,X_Test)
#See how accurately it is with the split
result=model.score(X_Test,Y_Test)
print(f'Accuaracy {result*100:5.3f}')
return
我运行以下代码:
top_sh(50)
shanghai.head()
X.shape
Y
Y.shape
train_test('shanghai',0.3,7)
```
X.shape = (768, 8)
Y.shape = (768, )
I get the following error on train_test function, specifically on model.fit line:
> ValueError: bad input shape (150, 6)
答案 0 :(得分:0)
问题很可能是由您传递给fit
引起的。它期望将X值用作预测变量,并将Y值用作预测变量,因此,您这一行是不正确的:
model.fit(X_Train,X_Test)
您应该改为尝试传递Y_train
:
model.fit(X_train,Y_train)