Question

我有以下问题：

给出一个数据框，例如

import pandas as pd
df = pd.DataFrame({'col1':[1,0,0,1],'col2':['B','B','A','A'],'col3':[1,2,3,4]})

在其他工具中，我可以根据条件轻松创建新列，例如

如果df ['col1'] =='0'＆〜df ['col2']。isnull（）否则使用'col1'

，用'col2'创建新列'col3'

这个其他工具可以很快地完成。到目前为止，我在python中找不到相应的表达式。

1。）我尝试了np.where，它在行中进行迭代，但不允许结果中的动态值与确切的行相对应

2。）我尝试了.apply（lambda ...），它似乎很安静。

如果您能找到一种解决此问题的优雅方法，我将非常高兴。谢谢。

Answer 1

我认为需要numpy.where和notnull而不是倒置isnull（感谢@jpp）：

df = pd.DataFrame({'col1':[1,0,0,1],'col2':['B','B','A','A'],'col3':[1,2,3,4]})

df['col3'] = np.where((df['col1'] == 0) & (df['col2'].notnull()), df['col2'], df['col1'])
print (df)
   col1 col2 col3
0     1    B    1
1     0    B    B
2     0    A    A
3     1    A    1

Answer 2

尝试一下：

import numpy as np
df['new_col'] = np.where(df['col1'] == 0 & (~df['col2'].isnull()), df['col2'], df['col1'] )

np.where比pd.apply快：Why is np.where faster than pd.apply

Answer 3

您可以使用df.loc：

df['col3'] = df['col1']
df.loc[(df['col1'] == 0 )& (~df['col2'].isnull()), 'col3'] = df['col2']

python pandas以优雅的方式按条件逐行替换

3 个答案: