我有2个不同大小的表,我想在Python中使用Pandas以下列方式合并:
UID Property Date
1 A 10/02/2016
2 B NaN
3 A 10/02/2016
4 C NaN
5 C NaN
6 A 10/02/2016
表1包含有关物业交易的信息以及与物业相关的日期。由于某些日期是NaN,我想从另一个表(表2)中代理它们,这些表仅包含有关属性的信息,但不替换表1中的任何日期:
Property DateProxy
A 01/01/2016
B 03/04/2016
C 16/05/2016
最后,我想获得以下内容:
UID Property Date
1 A 10/02/2016 (kept from T1)
2 B 03/04/2016 (imported from T2)
3 A 10/02/2016 (kept from T1)
4 C 16/05/2016 (imported from T2)
5 C 16/05/2016 (imported from T2)
6 A 10/02/2016 (kept from T1)
答案 0 :(得分:1)
首先让我们合并两个数据集:我们不会覆盖原始日期:
df_merge = pandas.merge(T1, T2, on='Property')
然后我们替换从' DateProxy'复制它们的缺失值。字段:
df_merge.Date = df_merge.apply(
lambda x: x['Date'] + ' (kept from T1)' if x['Date'] == x['Date']
else x['DateProxy'] + ' (imported from T2)',
axis=1
)
(x ['日期'] == x ['日期']是检查它是否不是NaN,NaN不等于它自己)。最后,我们可以删除代理列:
df_final = df_merge.drop('DateProxy', axis=1)