我有一个数据框:
id|concept |description
12| |rewards member
12|tier one |
12|not avail |rewards member
目标:创建一个新列final_desc
,其内容在concept
或description
列中
有4种可能的情况:
在concept
列中有一个值,而在description
中没有,其中final_desc
是concept
在description
列中有一个值,而在concept
中没有,其中final_desc
是description
concept
列中的值不可用,其中final_desc
是description
concept
和description
列均为空,其中final_desc
为空
我尝试使用where语句,但这不能解决方案3。
df['final_desc'] = np.where(df['concept'].isnull(), df['description'], df['concept'])
我认为我需要一个自定义函数,但不确定如何编写以跨列工作
答案 0 :(得分:0)
您可以结合使用replace
和ffill/bfill
:
df['final_desc'] = (df[['concept','description']].replace('not avail',np.nan)
.bfill(1)['concept']
)
输出:
id concept description final_desc
0 12 NaN rewards member rewards member
1 12 tier one NaN tier one
2 12 not avail rewards member rewards member
答案 1 :(得分:0)
这可能会达到目的:
df['final_desc'] = df.concept.replace('not avail',np.nan).fillna(df.description).fillna(df.concept.replace('not avail',np.nan))