我有一个df / zoo / xts /无论是按星期分割的。我希望每个参赛作品每周进一步拆分。
一个例子是星期五,有一个id列表,每个id都有一个相关的时间。这些时间可以是一年中的任何星期五。我想制作一个新的df,其中包含每个id以及每周的计数(按顺序)。
看起来像下面的每个w列是不同星期五的计数:
id w1 w2 w3 w4
1 id_1 1 2 2 8
2 id_2 3 1 5 2
3 id_3 7 4 10 7
dput:
structure(list(id = c("id_1", "id_2", "id_3"), w1 = c(1, 3, 7
), w2 = c(2, 1, 4), w3 = c(2L, 5L, 10L), w4 = c(8L, 2L, 7L)), .Names = c("id",
"w1", "w2", "w3", "w4"), row.names = c(NA, 3L), class = "data.frame")
这似乎对聚合来说已经成熟,但我不能完全正确地使用语法。我尝试过的其他事情如下:
# Applies sum to everything, which doesnt make sense in this context
apply.weekly(friday, sum)
# I considered doing something like getting the unique weeks with:
as.numeric(unique(format(friday[,2], "%U")))
# and then generating each week, getting the counts for each user, and then making a new df from this process. But this seems very inefficient.
编辑: str的输出(数据[1:20,]):
'data.frame': 20 obs. of 2 variables:
$ id : num 1 2 3 4 5 1 2 3 3 2 ...
$ time: POSIXct, format: "2011-04-25 14:00:00" "2011-04-28 20:00:00" "2011-05-03 06:00:00" "2011-05-06 14:00:00" ...
输出dput(数据[1:20,]):
structure(list(id = c(1, 2, 3, 4, 5, 1, 2, 3, 3, 2, 1, 4, 3,
2, 1, 4, 3, 2, 1, 7), time = structure(c(1303754400, 1304035200,
1304416800, 1304704800, 1304920800, 1305252000, 1305428400, 1305522000,
1305774000, 1306404000, 1306422000, 1308261600, 1308290400, 1308340800,
1308542400, 1308715200, 1308722400, 1308844800, 1309575600, 1309730400
), class = c("POSIXct", "POSIXt"))), .Names = c("id", "time"), row.names = c(1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L, 19L, 20L), class = "data.frame")
答案 0 :(得分:4)
如果我理解你想要的东西,你需要为一周中的某一天制作额外的列(这样你就可以识别出来)和一年中的一周(这样你们最终可以为每个星期分别列) 。使用您为data
提供的dput()
:
data$day.of.week <- format(data$time, "%A")
data$week.of.year <- format(data$time, "%U")
现在有效地想要重塑数据,所以使用reshape2
包(不是唯一的方式,而是我最熟悉的方式)
library("reshape2")
dcast(data[data$day.of.week=="Friday",], id~week.of.year,
value_var="time", fun.aggregate=length)
在该示例中,我将数据子集化为仅获取星期五。如果你想要所有的日子,但每天都分开,plyr
包可以帮助进行迭代。
library("plyr")
dlply(data, .(day.of.week), dcast, id~week.of.year,
value_var="time", fun.aggregate=length)
这两个结果是:
> dcast(data[data$day.of.week=="Friday",], id~week.of.year, value_var="time", fun.aggregate=length)
id 18 24 26
1 1 0 0 1
2 2 0 1 0
3 4 1 0 0
> dlply(data, .(day.of.week), dcast, id~week.of.year, value_var="time", fun.aggregate=length)
$Friday
id 18 24 26
1 1 0 0 1
2 2 0 1 0
3 4 1 0 0
$Monday
id 17
1 1 1
$Saturday
id 19
1 2 1
$Sunday
id 19 20 25 27
1 1 0 0 1 0
2 3 0 1 0 0
3 5 1 0 0 0
4 7 0 0 0 1
$Thursday
id 17 19 21 24 25
1 1 0 1 1 0 0
2 2 1 0 1 0 1
3 3 0 0 0 1 0
4 4 0 0 0 1 0
$Tuesday
id 18 25
1 3 1 1
2 4 0 1
$Wednesday
id 20
1 3 1
attr(,"split_type")
[1] "data.frame"
attr(,"split_labels")
day.of.week
1 Friday
2 Monday
3 Saturday
4 Sunday
5 Thursday
6 Tuesday
7 Wednesday