我有两个熊猫数据框 df1 和 df2 。我需要通过在df1['seq']
上进行分组并找到列df2
的总和来找到df2['sum_column']
。以下是示例数据和我当前的解决方案。
df1
id code amount seq
234 3 9.8 ?
213 3 18
241 3 6.4
543 3 2
524 2 1.8
142 2 14
987 2 11
658 3 17
df2
c_id name role sum_column
1 Aus leader 6
1 Aus client 1
1 Aus chair 7
2 Ned chair 8
2 Ned leader 3
3 Mar client 5
3 Mar chair 2
3 Mar leader 4
grouped = df2.groupby('c_id')['sum_column'].sum()
df3 = grouped.reset_index()
df3
c_id sum_column
1 14
2 11
3 11
遇到问题的下一步是将 df3 映射到 df1 并进行条件检查,以查看df1['amount']
是否大于{{1 }}。
df3['sum_column']
打印出df1['seq'] = np.where(df1['amount'] > df1['code'].map(df3.set_index('c_id')[sum_column]), 1, 0)
,我只得到df1['code'].map(df3.set_index('c_id')['sum_column'])
值。
有人知道这是怎么回事吗?
预期结果: df1
NaN
答案 0 :(得分:3)
应简化解决方案,删除.reset_index()
的{{1}}并将df3
传递给Series
:
map
从s = df2.groupby('c_id')['sum_column'].sum()
df1['seq'] = np.where(df1['amount'] > df1['code'].map(s), 1, 0)
到True, False
的布尔掩码转换为整数的替代方法:
1,0
df1['seq'] = (df1['amount'] > df1['code'].map(s)).astype(int)
答案 1 :(得分:1)
您忘记为select count (distinct(e.pkEnquiries))
from EnquiryDayFoodDrink edfd
inner join EnquiryDay ed
on ed.EnquiryDayId = edfd.EnquiryDayId
inner join Enquiries e
on ed.EnquiryId = e.pkEnquiries
添加报价了
sum_column