我在电子邮件和购买方面有如下数据集。
OPTIONS
我想查找数据集中的总人数,购买人数以及订单总数和总收入金额。我知道如何通过Email Purchaser order_id amount
a@gmail.com a@gmail.com 1 5
b@gmail.com
c@gmail.com c@gmail.com 2 10
c@gmail.com c@gmail.com 3 5
使用SQL
和聚合函数来执行此操作,但我不知道如何使用left join
/ Python
复制此操作。
对于pandas
,我使用Python
和pandas
尝试了此操作:
numpy
问题是 - 它只返回带有顺序的行(第1行和第3行)而不返回其他行(第2行)
table1 = table.groupby(['Email', 'Purchaser']).agg({'amount': np.sum, 'order_id': 'count'})
table1.agg({'Email': 'count', 'Purchaser': 'count', 'amount': np.sum, 'order_id': 'count'})
Email Purchaser order_id amount
a@gmail.com a@gmail.com 1 5
c@gmail.com c@gmail.com 2 15
查询应如下所示:
SQL
如何在SELECT count(Email) as num_ind, count(Purchaser) as num_purchasers, sum(order) as orders , sum(amount) as revenue
FROM
(SELECT Email, Purchaser, count(order_id) as order, sum(amount) as amount
FROM table 1
GROUP BY Email, Purchaser) x
中复制它?
答案 0 :(得分:4)
现在还没有在pandas中实现 - see。
因此,一个糟糕的解决方案是将[
"5-7" => "Red / S",
"5-8" => "Red / M",
"6-7" => "Blue / S",
"6-8" => "Blue / M"
]
替换为某个字符串,并在NaN
替换为agg
之后:
NaN
table['Purchaser'] = table['Purchaser'].replace(np.nan, 'dummy')