我有这样的数据框
df = pd.DataFrame({
'id':[1, 1, 2, 2, 3, 3, 4, 1, 2, 2],
'date': ['2019-01-01', '2019-01-01', '2019-01-01', '2019-01-01', '2019-01-01', '2019-01-01', '2019-01-01', '2019-02-01', '2019-02-01', '2019-02-01'],
'group_name': [1, 2, 1, 2, 1, 2, 2, 2, 1, 2],
'calls': [100, 50, 30, 10, 60, 10, 40, 50, 120, 20]
})
id date group_name calls
0 1 2019-01-01 1 100
1 1 2019-01-01 2 50
2 2 2019-01-01 1 30
3 2 2019-01-01 2 10
4 3 2019-01-01 1 60
5 3 2019-01-01 2 10
6 4 2019-01-01 2 40
7 1 2019-02-01 2 50
8 2 2019-02-01 1 120
9 2 2019-02-01 2 20
我想做的是transform(?)或按数据分组,所以我不是group_name,而是调用具有group1_calls和group2_calls的列。期望的结果如下:
id date group1_calls group2_calls
0 1 2019-01-01 100 50
1 2 2019-01-01 30 10
2 3 2019-01-01 60 10
3 4 2019-01-01 NaN 40
4 1 2019-02-01 NaN 50
5 2 2019-02-01 120 20
问题在于,并非每个日期/组名称(或两者都有)下都出现每个ID。如果我过滤初始数据帧,然后使用正确的联接将其连接,则可以使用,但不一定适用于将来的数据。
filt1 = df['group_name'] == 1
filt2 = df['group_name'] == 2
group1 = df[filt1]
group2 = df[filt2]
print(group1)
print()
print(group2)
grouped = pd.merge(group1, group2, how = 'right', left_on = ['id', 'date'], right_on = ['id', 'date'], indicator=True)
grouped.sort_values(by = ['date', 'id'])
id date group_name calls
0 1 2019-01-01 1 100
2 2 2019-01-01 1 30
4 3 2019-01-01 1 60
8 2 2019-02-01 1 120
id date group_name calls
1 1 2019-01-01 2 50
3 2 2019-01-01 2 10
5 3 2019-01-01 2 10
6 4 2019-01-01 2 40
7 1 2019-02-01 2 50
9 2 2019-02-01 2 20
id date group_name_x calls_x group_name_y calls_y _merge
0 1 2019-01-01 1.0 100.0 2 50 both
1 2 2019-01-01 1.0 30.0 2 10 both
2 3 2019-01-01 1.0 60.0 2 10 both
4 4 2019-01-01 NaN NaN 2 40 right_only
5 1 2019-02-01 NaN NaN 2 50 right_only
3 2 2019-02-01 1.0 120.0 2 20 both