这是我的客户订单数据对于单个客户来说的样子:
order_no customer_id product amount order_total
23 1 A 100 100
24 1 A 100 300
24 1 B 100 300
24 1 C 100 300
25 1 B 100 100
26 1 A 100 200
26 1 B 100 200
我想计算新列中每个客户的平均订单大小,因此对于该客户,它将是175 =(100 + 300 + 100 + 200)/ 4:
order_no customer_id amount order_total avg_order_size
23 1 100 100 175
24 1 100 300 175
24 1 100 300 175
24 1 100 300 175
25 1 100 100 175
26 1 100 200 175
26 1 100 200 175
我试过使用这个版本,但没有运气:
customer_stats <- data.table(customer_stats)[, avg_order_size := mean(order_total), by=list(order_no, customer_id)]
我真正需要做的是从每个order_no中选择一行,或者使用mean
的所有order_no[1]
中的by=(customer_id)
或许?如果有一种方法可以一步完成,并跳过创建order_total
,那就更好了。
答案 0 :(得分:2)
这个怎么样,它似乎翻译你的方法,而不需要在这里计算order_total
。
dat[, sum(amount), by = list(customer_id, order_no)][ ,avg_order := mean(V1), by = customer_id]
答案 1 :(得分:0)
您可以通过执行以下操作来避免创建order_total
:
customer_stats[ , avg_order_size := sum(amount, na.rm=TRUE) / length(unique(order_no)), by=customer_id]
但是,我对这有多快会有所保留。
答案 2 :(得分:0)
我认为诀窍是按客户和订单键入原始表格,按客户和订单总计订单总额,按客户获得平均订单总数,然后将其加入原始表格。
# Your data (next time, consider putting R-formatted data in the question...):
dt <- data.table(customer_id=1,
order_no=c(23,24,24,24,25,26,26),
product=c("A","A","B","C","B","A","B"),
product_amount=100,
key=c("customer_id","order_no")) # 1: key by customer and order
dt
# customer_id order_no product product_amount
#1: 1 23 A 100
#2: 1 24 A 100
#3: 1 24 B 100
#4: 1 24 C 100
#5: 1 25 B 100
#6: 1 26 A 100
#7: 1 26 B 100
dt[ # 4: join summary back to original
dt[,list(order_total=sum(product_amount)),by=list(customer_id,order_no)] [ # 2: order total by customer and order
,avg_order_size:=mean(order_total),by=list(customer_id)] # 3: add the average of order total by customer
]
# customer_id order_no product product_amount order_total avg_order_size
#1: 1 23 A 100 100 175
#2: 1 24 A 100 300 175
#3: 1 24 B 100 300 175
#4: 1 24 C 100 300 175
#5: 1 25 B 100 100 175
#6: 1 26 A 100 200 175
#7: 1 26 B 100 200 175