我有一个数据框如下
time site val
2014-09-01 00:00:00 2001 1
2014-09-01 00:15:00 2001 0
2014-09-01 00:30:00 2001 2
2014-09-01 00:45:00 2001 0
2014-09-01 00:00:00 2002 1
2014-09-01 00:15:00 2002 0
2014-09-01 00:30:00 2002 2
2014-09-02 00:45:00 2001 0
2014-09-02 00:00:00 2001 1
2014-09-02 00:15:00 2001 0
2014-09-02 00:30:00 2001 2
2014-09-02 00:45:00 2001 0
2014-09-02 00:00:00 2002 1
2014-09-02 00:15:00 2002 0
2014-09-02 00:30:00 2002 2
2014-09-02 00:45:00 2001 0
我希望能够按时间和网站对其进行分组,然后添加一个新的变量,该变量将包含该组的出现指数
time site val h
2014-09-01 00:00:00 2001 1 1
2014-09-01 00:15:00 2001 0 2
2014-09-01 00:30:00 2001 2 3
2014-09-01 00:45:00 2001 0 4
2014-09-01 00:00:00 2002 1 1
2014-09-01 00:15:00 2002 0 2
2014-09-01 00:30:00 2002 2 3
2014-09-02 00:45:00 2002 0 4
2014-09-02 00:00:00 2001 1 1
2014-09-02 00:15:00 2001 0 2
2014-09-02 00:30:00 2001 2 3
2014-09-02 00:45:00 2001 0 4
2014-09-02 00:00:00 2002 1 1
2014-09-02 00:15:00 2002 0 2
2014-09-02 00:30:00 2002 2 3
2014-09-02 00:45:00 2001 0 4
df <- structure(list(time = structure(c(1409522400, 1409523300, 1409524200,
1409525100, 1409522400, 1409523300, 1409524200, 1409611500, 1409608800,
1409609700, 1409610600, 1409611500, 1409608800, 1409609700, 1409610600,
1409611500), class = c("POSIXct", "POSIXt"), tzone = ""), site = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 1L), .Label = c("2001",
"2002"), class = "factor"), val = c(1L, 0L, 2L, 0L, 1L, 0L, 2L,
0L, 1L, 0L, 2L, 0L, 1L, 0L, 2L, 0L)), .Names = c("time", "site",
"val"), row.names = c(NA, -16L), class = "data.frame")
我实现这个目标的可能性是什么
感谢
答案 0 :(得分:1)
使用dplyr
。首先,我们创建一个列id
,从日期(列time
)中提取日期。然后,我们按site
和id
分组,并添加一个新变量counter
,计算这两个群体的出现次数。
df$id <- as.factor(format(df$time,'%d'))
library(dplyr)
df %>% group_by(site, id) %>% mutate(counter = row_number())
输出:
time site val id counter
(time) (fctr) (int) (fctr) (int)
1 2014-09-01 00:00:00 2001 1 01 1
2 2014-09-01 00:15:00 2001 0 01 2
3 2014-09-01 00:30:00 2001 2 01 3
4 2014-09-01 00:45:00 2001 0 01 4
5 2014-09-01 00:00:00 2002 1 01 1
6 2014-09-01 00:15:00 2002 0 01 2
7 2014-09-01 00:30:00 2002 2 01 3
8 2014-09-02 00:45:00 2001 0 02 1
9 2014-09-02 00:00:00 2001 1 02 2
10 2014-09-02 00:15:00 2001 0 02 3
11 2014-09-02 00:30:00 2001 2 02 4
12 2014-09-02 00:45:00 2001 0 02 5
13 2014-09-02 00:00:00 2002 1 02 1
14 2014-09-02 00:15:00 2002 0 02 2
15 2014-09-02 00:30:00 2002 2 02 3
16 2014-09-02 00:45:00 2001 0 02 6
答案 1 :(得分:0)
我们可以使用ave
df$h <- with(df, ave(val, cumsum(c(TRUE,diff(time)< 0)), FUN= seq_along))
df
# time site val h
#1 2014-09-01 03:30:00 2001 1 1
#2 2014-09-01 03:45:00 2001 0 2
#3 2014-09-01 04:00:00 2001 2 3
#4 2014-09-01 04:15:00 2001 0 4
#5 2014-09-01 03:30:00 2002 1 1
#6 2014-09-01 03:45:00 2002 0 2
#7 2014-09-01 04:00:00 2002 2 3
#8 2014-09-02 04:15:00 2001 0 4
#9 2014-09-02 03:30:00 2001 1 1
#10 2014-09-02 03:45:00 2001 0 2
#11 2014-09-02 04:00:00 2001 2 3
#12 2014-09-02 04:15:00 2001 0 4
#13 2014-09-02 03:30:00 2002 1 1
#14 2014-09-02 03:45:00 2002 0 2
#15 2014-09-02 04:00:00 2002 2 3
#16 2014-09-02 04:15:00 2001 0 4
注意:这是基于OP的帖子中显示的预期输出。我理解&#39;网站&#39;也被描述为分组变量,但预期输出应该是其他东西。