Question

假设我有以下DataFrame：

    0             1               2
1  10/1/2016    'stringvalue'     456
2  NaN          'anothersting'    NaN
3  NaN          'and another '    NaN
4  11/1/2016    'more strings'    943
5  NaN          'stringstring'    NaN

我想创建一个基于条件的新列“完整条目”。如果df [2]的值是NaN，则df ['Full Entry']也应该是NaN。

如果df [2]！= NaN df ['Full Entry']应该取df [1]的值。我想为每一行重复这一点。

我想出了以下代码

df['Full_Entry'] = [df[1] if pd.isnull(x) == False else np.NaN for x in df[2]]

但这给了我以下结果

    0             1               2     Full_Entry:
1  10/1/2016    'stringvalue'     456     0 stringv... 
2  NaN          'anothersting'    NaN     NaN
3  NaN          'and another '    NaN     NaN
4  11/1/2016    'more strings'    943     0 stringv...
5  NaN          'stringstring'    NaN     NaN

然而我想要的是：

    0             1               2     Full_Entry:
1  10/1/2016    'stringvalue'     456     stringvalue 
2  NaN          'anothersting'    NaN     NaN
3  NaN          'and another '    NaN     NaN
4  11/1/2016    'more strings'    943     more strings
5  NaN          'stringstring'    NaN     NaN

我的代码中的'if'条件似乎在恰当的时刻触发，但仅使用第一行的值。由于某种原因，也包括'0'。

有没有人知道我的代码有什么问题？

Answer 1

选项1
pd.Series.mask

df['Full Entry'] = df.iloc[:, 1].mask(df.iloc[:, 2].isnull())

或者，

df['Full Entry'] = df.iloc[:, 2].mask(pd.notnull, df.iloc[:, 1])

df

           0             1      2    Full Entry
1  10/1/2016   stringvalue  456.0   stringvalue
2        NaN  anothersting    NaN           NaN
3        NaN   and another    NaN           NaN
4  11/1/2016  more strings  943.0  more strings
5        NaN  stringstring    NaN           NaN

选项2
pd.Series.where -

df['Full Entry'] = df.iloc[:, 2].where(pd.isnull, df.iloc[:, 1])    
df

           0             1      2    Full Entry
1  10/1/2016   stringvalue  456.0   stringvalue
2        NaN  anothersting    NaN           NaN
3        NaN   and another    NaN           NaN
4  11/1/2016  more strings  943.0  more strings
5        NaN  stringstring    NaN           NaN

Answer 2

还可以使用apply功能：

df['Full Entry'] = df.apply(lambda x: np.NaN if pd.isnull(x[2]) else x[1], axis=1)
print(df)

输出：

           0             1      2    Full Entry
1  10/1/2016   stringvalue  456.0   stringvalue
2        NaN  anothersting    NaN           NaN
3        NaN   and another    NaN           NaN
4  11/1/2016  more strings  943.0  more strings
5        NaN  stringstring    NaN           NaN

Answer 3

使用numpy where：

df['Full_Entry']=np.where(pd.isnull(df.2), np.NaN, df.1)

如果条件不按预期工作，则填充列

3 个答案: