链接多个combine_first()

时间:2018-01-30 08:57:15

标签: python pandas

链接多个combine_first()语句的更好方法是什么。 即。

我已经解析了一些数据,并为cc-email提供了3个不同的列。这有效,但是有更简洁的方法吗?

df['cc-email2'] = df['cc-email'].combine_first(
df['cc-email_cc-email'].combine_first(
df['cc-emails_cc-email']))

e.g。

df = pd.DataFrame([])
df['cc-email'] = ('bob@hotmail.com', np.nan, np.nan, np.nan)
df['cc-email_cc-email'] = (np.nan, 'michael@outlook.com', np.nan, np.nan)
df['cc-emails_cc-email'] = ('bob@yahoo.com', np.nan, np.nan, 'trey@gmail.com')

结果df:

     cc-email           cc-email_cc-email   cc-emails_cc-email    cc-email2
0    bob@hotmail.com    NaN                 bob@yahoo.com         bob@hotmail.com
1    NaN                michael@outlook.com NaN                   michael@outlook.com
2    NaN                NaN                 NaN                   NaN
3    NaN                NaN                 trey@gmail.com        trey@gmail.com

1 个答案:

答案 0 :(得分:1)

我认为你可以使用reduce

from functools import reduce

dfs = [df['cc-email'], df['cc-email_cc-email'], df['cc-emails_cc-email']]
df['cc-email2'] = reduce(lambda l,r: l.combine_first(r), dfs)

但似乎ffill选择最后一列也应该有效:

df['cc-email2'] = df.ffill(axis=1).iloc[:, -1]
print (df)
          cc-email    cc-email_cc-email cc-emails_cc-email  \
0  bob@hotmail.com                  NaN      bob@yahoo.com   
1              NaN  michael@outlook.com                NaN   
2              NaN                  NaN                NaN   
3              NaN                  NaN     trey@gmail.com   

             cc-email2  
0        bob@yahoo.com  
1  michael@outlook.com  
2                  NaN  
3       trey@gmail.com