链接多个combine_first()语句的更好方法是什么。 即。
我已经解析了一些数据,并为cc-email提供了3个不同的列。这有效,但是有更简洁的方法吗?
df['cc-email2'] = df['cc-email'].combine_first(
df['cc-email_cc-email'].combine_first(
df['cc-emails_cc-email']))
e.g。
df = pd.DataFrame([])
df['cc-email'] = ('bob@hotmail.com', np.nan, np.nan, np.nan)
df['cc-email_cc-email'] = (np.nan, 'michael@outlook.com', np.nan, np.nan)
df['cc-emails_cc-email'] = ('bob@yahoo.com', np.nan, np.nan, 'trey@gmail.com')
结果df:
cc-email cc-email_cc-email cc-emails_cc-email cc-email2
0 bob@hotmail.com NaN bob@yahoo.com bob@hotmail.com
1 NaN michael@outlook.com NaN michael@outlook.com
2 NaN NaN NaN NaN
3 NaN NaN trey@gmail.com trey@gmail.com
答案 0 :(得分:1)
我认为你可以使用reduce
:
from functools import reduce
dfs = [df['cc-email'], df['cc-email_cc-email'], df['cc-emails_cc-email']]
df['cc-email2'] = reduce(lambda l,r: l.combine_first(r), dfs)
但似乎ffill
选择最后一列也应该有效:
df['cc-email2'] = df.ffill(axis=1).iloc[:, -1]
print (df)
cc-email cc-email_cc-email cc-emails_cc-email \
0 bob@hotmail.com NaN bob@yahoo.com
1 NaN michael@outlook.com NaN
2 NaN NaN NaN
3 NaN NaN trey@gmail.com
cc-email2
0 bob@yahoo.com
1 michael@outlook.com
2 NaN
3 trey@gmail.com