合并两个数据帧:我有两个数据帧需要合并某些条件,但我还没有弄清楚如何做到这一点?
df1 :
id positive_action date volume
id_1 user 1 2016-12-12 19720.735
user 2 2016-12-12 14740.800
df2 :
id negative_action date volume
id_1 user 1 2016-12-12 10.000
user 3 2016-12-12 10.000
I want :
id action date volume
id_1 user 1 2016-12-12 19730.735
user 2 2016-12-12 14740.800
user 3 2016-12-12 10.000
这里
我如何实现这一目标?
答案 0 :(得分:3)
您还可以在将positive_action和negative_action列重命名为action
之后连接您的DataFrame,然后执行groupby。
df1.rename(columns={'positive_action':'action'}, inplace=True)
df2.rename(columns={'negative_action':'action'}, inplace=True)
pd.concat([df1, df2]).groupby(['id', 'action', 'date']).sum().reset_index()
id action date volume
0 id_1 user 1 2016-12-12 19730.735
1 id_1 user 2 2016-12-12 14740.800
2 id_1 user 3 2016-12-12 10.000
答案 1 :(得分:2)
这应该有效:
# not sure what indexing you are using so lets remove it
# to get on the same page, so to speak ;).
df1 = df1.reset_index()
df2 = df2.reset_index()
# do an outer merge to allow mismatches on the actions.
df = df1.merge(
df2, left_on=['id', 'positive_action', 'date'],
right_on=['id', 'negative_action', 'date'],
how='outer',
)
# fill the missing actions from one with the other.
# (Will only happen when one is missing due to the way we merged.)
df['action'] = df['positive_action'].fillna(df['negative_action'])
# drop the old actions
df = df.drop('positive_action', 1)
df = df.drop('negative_action', 1)
# aggregate the volumes (I'm assuming you mean a simple sum)
df['volume'] = df['volume_x'].fillna(0) + df['volume_y'].fillna(0)
# drop the old volumes
df = df.drop('volume_x', 1)
df = df.drop('volume_y', 1)
print(df)
输出结果为:
id date volume action
0 id_1 2016-12-12 19730.735 user_1
1 id_1 2016-12-12 14740.800 user_2
2 id_1 2016-12-12 10.000 user_3
然后,您可以恢复我可能已删除的索引。
答案 2 :(得分:2)
set_index
在rename_axis
因为当我们add
时,如果我们的索引级别不一致,就会让熊猫哭泣。pd.Series.add
与参数fill_value=0
rename_axis
reset_index
,您正在开展业务v1 = df1.set_index(['positive_action', 'date']).volume.rename_axis([None, None])
v2 = df2.set_index(['negative_action', 'date']).volume.rename_axis([None, None])
v1.add(v2, fill_value=0).rename_axis(['action', 'date']).reset_index()
action date volume
0 user 1 2016-12-12 19730.735
1 user 2 2016-12-12 14740.800
2 user 3 2016-12-12 10.000
设置
df1 = pd.DataFrame(dict(
positive_action=['user 1', 'user 2'],
date=pd.to_datetime(['2016-12-12', '2016-12-12']),
volume=[19720.735, 14740.800]
))
df2 = pd.DataFrame(dict(
negative_action=['user 1', 'user 3'],
date=pd.to_datetime(['2016-12-12', '2016-12-12']),
volume=[10, 10]
))