Question

我有Dataframe df我选择了它的一些coulmns，我想把它们分成xtrain和xtest，以适应一个名为Sevrice的coulmn。所以原始的1和o进入xtrain并纳入xtest。

Service
1
0
0
1
Nan
Nan

xtarin = df.loc[df['Service'].notnull(), ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]

EDITED

    ytrain = df['Service'].dropna()
    Xtest=df.loc[df['Service'].isnull(),['Age','Fare','GSize','Deck','Class','Profession_title']]
    import pandas as pd
    from sklearn.linear_model import LogisticRegression
    logistic = LogisticRegression()
    logistic.fit(xtrain, ytrain)
    logistic.predict(xtest)

我为logistic.predict(xtest)

收到此错误

X has 220 features per sample; expecting 307

Answer 1

我认为你需要isnull：

Xtest=df.loc[df['Service'].isnull(),['Age','Fare','GSize','Deck','Class','Profession_title']]

另一种解决方案是boolean mask反转~：

mask = df['Service'].notnull()
xtarin = df.loc[mask, ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]
Xtest = df.loc[~mask, ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]

编辑：

df = pd.DataFrame({'Service':[1,0,np.nan,np.nan],
                   'Age':[4,5,6,5],
                   'Fare':[7,8,9,5],
                   'GSize':[1,3,5,7],
                   'Deck':[5,3,6,2],
                   'Class':[7,4,3,0],
                    'Profession_title':[6,7,4,6]})

print (df)
   Age  Class  Deck  Fare  GSize  Profession_title  Service
0    4      7     5     7      1                 6      1.0
1    5      4     3     8      3                 7      0.0
2    6      3     6     9      5                 4      NaN
3    5      0     2     5      7                 6      NaN

ytrain = df['Service'].dropna()
xtrain = df.loc[df['Service'].notnull(), ['Age','Fare', 'GSize','Deck','Class', 'Profession_title' ]]
xtest=df.loc[df['Service'].isnull(),['Age','Fare','GSize','Deck','Class','Profession_title']]
import pandas as pd
from sklearn.linear_model import LogisticRegression
logistic = LogisticRegression()
logistic.fit(xtrain, ytrain)
print (logistic.predict(xtest))
[ 0.  0.]

根据列将数据帧划分为两组

1 个答案: