我有一些家庭的数据,这些家庭在一段时间内以每周的代码作为正整数的形式为每张收据都使用个人ID购买商品。 我需要计算4周内每个家庭的收据数量(数据超过3年;第一年-52周,第二-53,3d-48)。最终,我希望每个家庭每4周平均购买一次。如果该解决方案包括转换为几个月并每月计数,那也可以。数据集超过10万行。我是R的新手,非常感谢所有建议!
Household<-c(1,2,3,1,1,2,2,2,3,1,3,3)
Week<-c(201501,201501,201501,201502,201502,201502,201502,201503,201503,201504,201504,201504)
Receipt<-c(111,112,113,114,115,116,117,118,119,120,121,121)
df<-data.frame(Household,Week,Receipt)
答案 0 :(得分:0)
这将计算每4周内每个场所的接收(行)数
library(data.table)
setDT(df)
n_reciepts <- df[, .N, by = .(Household, period = floor(Week/4))]
# Household period N
# 1: 1 50375 3
# 2: 2 50375 4
# 3: 3 50375 2
# 4: 1 50376 1
# 5: 3 50376 2
那么您只需要在所有期间按住户平均数
avg_n_reciepts <- n_reciepts[, .(avg_reciepts = mean(N)), by = Household]
# Household avg_reciepts
# 1: 1 2
# 2: 2 4
# 3: 3 2
您也可以一步一步完成
df[, .N, by = .(Household, period = floor(Week/4))
][, .(avg_reciepts = mean(N)), by = Household]
# Household avg_reciepts
# 1: 1 2
# 2: 2 4
# 3: 3 2
dplyr等效项:
library(dplyr)
df %>%
group_by(Household, period = floor(Week/4)) %>%
count %>%
group_by(Household) %>%
summarise(avg_reciepts = mean(n))
# # A tibble: 3 x 2
# Household avg_reciepts
# <dbl> <dbl>
# 1 1 2
# 2 2 4
# 3 3 2