group_by()并计算自上次购买特定品牌以来的周数之间的差额

时间:2018-10-04 19:55:40

标签: r dplyr

我正在尝试按<Navbar fixed="top">分组,然后计算给定PANID的两个日期之间的差。我正在尝试计算自从某人购买特定品牌以来已有多少周。

brandID对应一个人 PANID到几周 WEEK到特定品牌

我要回答的问题是,给定的人再次购买同一品牌花了几周时间。零是他们没有再次购买该品牌,等等。

只是不确定如何去做。

brandID

编辑:

预期的输出将类似于

df <- structure(list(PANID = c(1100016L, 1100016L, 1100016L, 1100057L, 
1100057L, 1100057L, 1100057L, 1100057L, 1100057L, 1100057L, 1100057L, 
1100180L, 1100180L, 1100321L, 1100404L, 1100404L, 1100545L, 1100560L, 
1100560L, 1101162L, 1101162L, 1101238L, 1101279L, 1101279L, 1101279L, 
1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 
1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 
1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 
1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 
1101279L, 1101279L, 1101279L, 1101279L, 1101279L, 1101758L, 1102012L, 
1102046L, 1102384L, 1102384L, 1103069L, 1103101L, 1103101L, 1103101L, 
1103101L, 1103101L, 1103101L, 1103101L, 1103325L, 1103614L, 1103671L, 
1103671L, 1103895L, 1103895L, 1103895L, 1104182L, 1104182L, 1104182L, 
1104182L, 1104182L, 1104182L, 1104182L, 1104182L, 1104182L, 1104182L, 
1104182L, 1104182L, 1104208L, 1104208L, 1104224L, 1104273L, 1104315L, 
1104315L, 1104315L, 1104315L, 1104364L, 1104372L, 1104372L, 1104406L, 
1104869L, 1104968L, 1105171L, 1105171L, 1105239L, 1105239L, 1105239L, 
1105965L, 1105999L, 1106146L, 1106567L, 1106567L, 1106567L, 1106575L, 
1107235L, 1107235L, 1107235L, 1107235L, 1107235L, 1107235L, 1107235L, 
1107425L, 1107425L, 1107581L, 1107581L, 1107847L, 1107847L, 1107862L, 
1107953L, 1108043L, 1108225L, 1108308L, 1108308L, 1108308L, 1108308L, 
1108308L, 1108308L, 1108308L, 1108456L, 1108456L, 1108456L, 1108456L, 
1108456L, 1108456L, 1108456L, 1108456L, 1108456L, 1108456L, 1108456L, 
1108456L, 1108456L, 1108456L, 1108456L, 1108456L, 1109165L, 1109165L, 
1109306L, 1109306L, 1109306L, 1109306L, 1109306L, 1109306L, 1109306L, 
1109306L, 1109306L, 1109306L, 1109306L, 1109306L, 1109306L, 1109306L, 
1109306L, 1109306L, 1109447L, 1109447L, 1109447L, 1109470L, 1109470L, 
1109470L, 1109470L, 1109470L, 1109470L, 1109470L, 1109488L, 1109595L, 
1109629L, 1109629L, 1109629L, 1109629L, 1109629L, 1109629L, 1109629L, 
1109629L, 1109629L, 1109652L, 1109652L, 1109652L, 1109652L, 1109652L
), WEEK = structure(c(11400, 11533, 11673, 11484, 11491, 11491, 
11519, 11561, 11568, 11582, 11603, 11344, 11351, 11673, 11673, 
11673, 11533, 11344, 11631, 11442, 11652, 11673, 11323, 11323, 
11344, 11344, 11344, 11372, 11372, 11372, 11393, 11400, 11400, 
11414, 11414, 11428, 11428, 11435, 11435, 11449, 11470, 11470, 
11470, 11526, 11526, 11533, 11533, 11547, 11561, 11568, 11568, 
11568, 11589, 11589, 11610, 11610, 11631, 11652, 11421, 11463, 
11449, 11680, 11680, 11659, 11323, 11351, 11379, 11540, 11610, 
11645, 11659, 11645, 11442, 11498, 11540, 11519, 11540, 11596, 
11337, 11407, 11435, 11456, 11470, 11491, 11526, 11561, 11589, 
11596, 11666, 11666, 11519, 11540, 11554, 11414, 11673, 11673, 
11673, 11673, 11673, 11442, 11673, 11673, 11386, 11673, 11540, 
11673, 11484, 11491, 11596, 11680, 11673, 11505, 11323, 11344, 
11554, 11610, 11463, 11463, 11484, 11498, 11498, 11645, 11659, 
11379, 11484, 11533, 11673, 11414, 11428, 11533, 11631, 11680, 
11547, 11344, 11344, 11344, 11344, 11372, 11442, 11561, 11372, 
11379, 11393, 11421, 11442, 11456, 11463, 11498, 11498, 11554, 
11554, 11575, 11582, 11596, 11610, 11673, 11575, 11666, 11428, 
11449, 11463, 11477, 11498, 11512, 11526, 11540, 11554, 11561, 
11575, 11575, 11589, 11624, 11638, 11666, 11463, 11519, 11589, 
11351, 11365, 11400, 11547, 11547, 11617, 11645, 11498, 11547, 
11337, 11344, 11372, 11414, 11463, 11484, 11498, 11505, 11673, 
11337, 11351, 11358, 11365, 11393), class = "Date"), brandID = c(48, 
48, 48, 209, 207, 217, 29, 29, 90, 90, 90, 131, 215, 28, 50, 
208, 136, 216, 207, 48, 48, 28, 29, 28, 29, 28, 140, 29, 28, 
140, 29, 28, 140, 29, 28, 29, 28, 29, 28, 28, 29, 28, 140, 29, 
28, 28, 140, 33, 28, 29, 28, 140, 29, 28, 28, 140, 28, 28, 131, 
140, 29, 209, 214, 29, 29, 29, 29, 153, 153, 153, 153, 29, 53, 
28, 33, 135, 135, 209, 211, 128, 225, 211, 211, 62, 211, 156, 
62, 211, 211, 62, 217, 217, 32, 50, 209, 210, 211, 156, 214, 
48, 48, 48, 50, 173, 138, 140, 207, 217, 214, 127, 69, 29, 100, 
100, 214, 48, 51, 50, 51, 131, 51, 51, 51, 230, 246, 32, 32, 
50, 50, 28, 100, 131, 135, 82, 1, 6, 69, 82, 7, 1, 48, 48, 79, 
48, 48, 29, 29, 29, 48, 79, 48, 29, 173, 29, 29, 29, 173, 99, 
140, 140, 140, 140, 140, 140, 140, 140, 140, 140, 140, 140, 140, 
140, 140, 140, 250, 28, 131, 34, 158, 34, 34, 157, 157, 157, 
48, 50, 29, 29, 29, 29, 29, 29, 29, 29, 29, 99, 99, 99, 99, 99
)), row.names = c(NA, 200L), class = "data.frame")

2 个答案:

答案 0 :(得分:3)

我们可以按“ PANID”,“ brandID”分组,然后应用trianglelist来获取“每周”列中日期的每周差异

pt->trianglelist.push_back(t); ***// fails here!***

答案 1 :(得分:2)

使用data.table的解决方案:

library(data.table)

setDT(df)

df <- df[order(PANID, WEEK)] # to garantee that the dates are ordered for every client

mWeeks <- df[, .(mean.weeks = mean(c(0, diff(week(WEEK))), na.rm = TRUE)), by = .(PANID, brandID)]

mWeeks[mean.weeks == 0, mean.weeks := NA]

> mWeeks
     PANID brandID mean.weeks
1: 1100016      48  13.000000
2: 1100057     209         NA
3: 1100057     207         NA
4: 1100057     217         NA
5: 1100057      29   3.000000
6: 1100057      90   1.666667
7: 1100180     131         NA
8: 1100180     215         NA
...