我有一个数据,我试图使用double for循环聚合。基本上,我想通过每个TOP_LEVEL_CATEGORY计算每个MEM_ID的购买周期。数据如下所示,
MEM_ID ORDER_DEL_DATE TOP_LEVEL_CATEGORY
999984 2016-01-07 household
999984 2016-02-03 household
999980 2015-12-16 household
999980 2016-01-03 household
999980 2016-01-05 household
999980 2016-02-14 household
999984 2016-01-07 personal-care
999980 2016-01-03 personal-care
999980 2016-01-30 personal-care
代码
PC_test <- NA
for(i in unique(test$MEM_ID)){
for(j in unique(test$TOP_LEVEL_CATEGORY)){
PC_test[c(i,j)] <- data.frame(c(MEM_ID=i,CATEGORY=j,ifelse(nrow(test[test$MEM_ID==i & test$TOP_LEVEL_CATEGORY==j,])<=2,
max(test[test$MEM_ID==i & test$TOP_LEVEL_CATEGORY==j,"ORDER_DEL_DATE"])-min(test[test$MEM_ID==i & test$TOP_LEVEL_CATEGORY==j,"ORDER_DEL_DATE"]),
max(test[test$MEM_ID==i & test$TOP_LEVEL_CATEGORY==j,"ORDER_DEL_DATE"])-maxN(test[test$MEM_ID==i & test$TOP_LEVEL_CATEGORY==j,"ORDER_DEL_DATE"]))))
}
}
注意:maxN函数给出了第二大函数。
抛出一个无关的输出,如下所示,
NA. X999984 household personal.care X999980
NA 999984 999980 999980 999980
NA personal-care household personal-care personal-care
NA 0 40 27 27
我希望以下格式输出
MEM_ID TOP_LEVEL_CATEGORY PC_test
999984 household 27
999984 personal-care 0
999980 household 40
999980 personal-care 27
非常感谢帮助。在此先感谢!!!
答案 0 :(得分:1)
我想你想要这样的东西
require(data.table)
setDT(df1)
# calculate the min and max date for each MEM_ID/TOP_LEVEL_CATEGORY pair, then find the difference for PC_test
df1[, .(max_date=max(ORDER_DEL_DATE), min_date=min(ORDER_DEL_DATE)),
keyby=.(MEM_ID,TOP_LEVEL_CATEGORY)][, .(MEM_ID, TOP_LEVEL_CATEGORY, PC_test = max_date - min_date)]
MEM_ID TOP_LEVEL_CATEGORY PC_test
1: 999980 household 60 days
2: 999980 personal-care 27 days
3: 999984 household 27 days
4: 999984 personal-care 0 days
根据你的说法,这是我对你如何计算PC_test而不完全分解你的最小和最大公式的有根据的猜测。