拟合训练和测试集,train_test_split方法

时间:2019-11-13 12:44:21

标签: python machine-learning scikit-learn valueerror train-test-split

我正在尝试使用train_test_split评估我的模型。我定义了以下函数,根据函数中的输入在表(顶部列)上创建输出数组:

def top_sh(num):
    ###Get the top(num) in Shanghai data and arrange
    ####input and output variables accordingly
    #Add column to be output value, either zero or one

    #shanghai = shanghai_cp.copy()
    if 'top' in shanghai.columns:
        shanghai.drop(columns = shanghai.columns[-1],inplace = True) 

    shanghai['top'] = shanghai['world_rank'].apply(lambda x: 1 if x<= num else 0)
    out = print('*****************'+ '\n' + 'Output array: Top'+ str(num)+ '\n' + 'Disregarding in Analysis: World rank')
    #call = print(shanghai.head(15))

    return out

然后,我将火车测试拆分的过程定义如下:

def train_test(df,size, seed):
    ###Split the data into test and train sets and test

    #Get input output of df
    if df == 'shanghai':
        column1 = shanghai.columns[1:7]
        Y = shanghai.values[: , -1].astype(int)
        y = np.ravel(Y)
        X = shanghai.values[: , 1:7]
    elif df == 'times':
        column1 = times.columns[1:10]
        Y = times.values[: , -1].astype(int)
        y = np.ravel(Y)
        X = times.values[: , 1:10]
    else:
        return print('Available Datasets: "shanghai" , "times"')

    #Split into train and test
    X_Train, X_Test, Y_Train, Y_Test = train_test_split(X,Y, test_size=size, random_state=seed)

    #Get the regression
    model= LogisticRegression(solver='liblinear')
    model.fit(X_Train,X_Test)

    #See how accurately it is with the split
    result=model.score(X_Test,Y_Test)

    print(f'Accuaracy {result*100:5.3f}')

    return

我运行以下代码:

top_sh(50)
shanghai.head()
X.shape
Y
Y.shape
train_test('shanghai',0.3,7)
```

X.shape = (768, 8)
Y.shape = (768, )

I get the following error on train_test function, specifically on model.fit line:

> ValueError: bad input shape (150, 6)

1 个答案:

答案 0 :(得分:0)

问题很可能是由您传递给fit引起的。它期望将X值用作预测变量,并将Y值用作预测变量,因此,您这一行是不正确的:

model.fit(X_Train,X_Test)

您应该改为尝试传递Y_train

model.fit(X_train,Y_train)