我有以下两个数据框,我已将日期设置为DateTime Index df.set_index(pd.to_datetime(df['date']), inplace=True)
,并希望在日期合并或加入:
df.head(5)
catcode_amt type feccandid_amt amount
date
1915-12-31 A5000 24K H6TX08100 1000
1916-12-31 T6100 24K H8CA52052 500
1954-12-31 H3100 24K S8AK00090 1000
1985-12-31 J7120 24E H8OH18088 36
1997-12-31 z9600 24K S6ND00058 2000
d.head(5)
catcode_disp disposition feccandid_disp bills
date
2007-12-31 A0000 support S4HI00011 1
2007-12-31 A1000 oppose S4IA00020', 'P20000741 1
2007-12-31 A1000 support S8MT00010 1
2007-12-31 A1500 support S6WI00061 2
2007-12-31 A1600 support S4IA00020', 'P20000741 3
我尝试了以下两种方法但都返回了MemoryError:
df.join(d, how='right')
我在没有将日期设置为索引的数据帧上使用下面的代码。
merge=pd.merge(df,d, how='inner', on='date')
答案 0 :(得分:11)
如果需要按函数merge
中的索引进行合并,则可以添加参数left_index=True
和right_index=True
:
merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
示例(d
中索引的第一个值已更改以进行匹配):
print df
catcode_amt type feccandid_amt amount
date
1915-12-31 A5000 24K H6TX08100 1000
1916-12-31 T6100 24K H8CA52052 500
1954-12-31 H3100 24K S8AK00090 1000
1985-12-31 J7120 24E H8OH18088 36
1997-12-31 z9600 24K S6ND00058 2000
print d
catcode_disp disposition feccandid_disp bills
date
1997-12-31 A0000 support S4HI00011 1.0
2007-12-31 A1000 oppose S4IA00020', 'P20000741 1 NaN
2007-12-31 A1000 support S8MT00010 1.0
2007-12-31 A1500 support S6WI00061 2.0
2007-12-31 A1600 support S4IA00020', 'P20000741 3 NaN
merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
print merge
catcode_amt type feccandid_amt amount catcode_disp disposition \
date
1997-12-31 z9600 24K S6ND00058 2000 A0000 support
feccandid_disp bills
date
1997-12-31 S4HI00011 1.0
或者您可以使用concat
:
print pd.concat([df,d], join='inner', axis=1)
date
1997-12-31 z9600 24K S6ND00058 2000 A0000 support
feccandid_disp bills
date
1997-12-31 S4HI00011 1.0
编辑:EdChum是对的:
我向DataFrame df
添加重复项(索引中的最后2个值):
print df
catcode_amt type feccandid_amt amount
date
1915-12-31 A5000 24K H6TX08100 1000
1916-12-31 T6100 24K H8CA52052 500
1954-12-31 H3100 24K S8AK00090 1000
2007-12-31 J7120 24E H8OH18088 36
2007-12-31 z9600 24K S6ND00058 2000
print d
catcode_disp disposition feccandid_disp bills
date
1997-12-31 A0000 support S4HI00011 1.0
2007-12-31 A1000 oppose S4IA00020', 'P20000741 1 NaN
2007-12-31 A1000 support S8MT00010 1.0
2007-12-31 A1500 support S6WI00061 2.0
2007-12-31 A1600 support S4IA00020', 'P20000741 3 NaN
merge=pd.merge(df,d, how='inner', left_index=True, right_index=True)
print merge
catcode_amt type feccandid_amt amount catcode_disp disposition \
date
2007-12-31 J7120 24E H8OH18088 36 A1000 oppose
2007-12-31 J7120 24E H8OH18088 36 A1000 support
2007-12-31 J7120 24E H8OH18088 36 A1500 support
2007-12-31 J7120 24E H8OH18088 36 A1600 support
2007-12-31 z9600 24K S6ND00058 2000 A1000 oppose
2007-12-31 z9600 24K S6ND00058 2000 A1000 support
2007-12-31 z9600 24K S6ND00058 2000 A1500 support
2007-12-31 z9600 24K S6ND00058 2000 A1600 support
feccandid_disp bills
date
2007-12-31 S4IA00020', 'P20000741 1 NaN
2007-12-31 S8MT00010 1.0
2007-12-31 S6WI00061 2.0
2007-12-31 S4IA00020', 'P20000741 3 NaN
2007-12-31 S4IA00020', 'P20000741 1 NaN
2007-12-31 S8MT00010 1.0
2007-12-31 S6WI00061 2.0
2007-12-31 S4IA00020', 'P20000741 3 NaN
答案 1 :(得分:3)
看起来您的日期是您的索引,在这种情况下,您希望合并索引,而不是列。如果您有两个数据框,df_1
和df_2
:
df_1.merge(df_2, left_index=True, right_index=True, how='inner')
答案 2 :(得分:1)
我遇到了类似的问题。您很可能有很多NaT。
我删除了所有的NaT,然后执行了加入并能够加入。
df = df[df['date'].notnull() == True].set_index('date')
d = d[d['date'].notnull() == True].set_index('date')
df.join(d, how='right')