我的数据如下:
year month flag group
1: 1992 6 1 8
2: 1992 7 0 8
3: 1992 8 0 8
4: 1992 9 0 8
5: 1992 10 0 8
6: 1992 11 0 8
7: 1992 12 0 8
8: 1995 6 0 10
9: 1995 7 0 11
10: 1995 8 0 11
11: 1995 9 1 11
12: 1995 10 0 11
13: 1995 11 0 11
14: 1995 12 0 11
15: 1998 6 0 13
16: 1998 7 0 13
17: 1998 8 0 13
18: 1998 9 0 13
19: 1998 10 0 13
20: 1998 11 0 13
21: 1998 12 0 13
我需要做的是为flag
列中第一次观察到1的所有行赋值为1,但是这也需要由group
完成。
作为一个具体的例子,我想要这个:
year month flag group
1: 1992 6 1 8
2: 1992 7 1 8
3: 1992 8 1 8
4: 1992 9 1 8
5: 1992 10 1 8
6: 1992 11 1 8
7: 1992 12 1 8
8: 1995 6 0 10
9: 1995 7 0 11
10: 1995 8 0 11
11: 1995 9 1 11
12: 1995 10 1 11
13: 1995 11 1 11
14: 1995 12 1 11
15: 1998 6 0 13
16: 1998 7 0 13
17: 1998 8 0 13
18: 1998 9 0 13
19: 1998 10 0 13
20: 1998 11 0 13
21: 1998 12 0 13
注意第1行:7现在是第1行以及第11行第14行,并注意第15行第21行如何变化,看看最初没有第1行。
我的大多数想法都围绕着使用which
来按组找出前1个索引,但是我遇到了一些麻烦。
如果有人有任何基于data.table()
的解决方案,那就太好了。
感谢您的帮助!
如果有帮助,这里是我的基本数据的dput()
:
library(data.table)
DT = setDT(structure(list(year = c(1992, 1992, 1992, 1992, 1992, 1992, 1992,
1992, 1992, 1992, 1992, 1992, 1995, 1995, 1995, 1995, 1995, 1995,
1995, 1995, 1995, 1995, 1995, 1995, 1998, 1998, 1998, 1998, 1998,
1998, 1998, 1998, 1998, 1998, 1998, 1998), month = c(1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), flag = c(0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), group = c(8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 10L, 10L, 10L, 10L, 10L,
10L, 11L, 11L, 11L, 11L, 11L, 11L, 13L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 13L)), row.names = c(NA, -36L),
class = c("data.table", "data.frame")))
答案 0 :(得分:2)
对于第一个出现的行,我们返回1,其中flag = 1
并且该组至少有一个flag = 1
library(data.table)
dt[,flag := +(seq_len(.N)>= which.max(flag == 1) & any(flag == 1)),by = group]
dt
# year month flag group
# 1: 1992 6 1 8
# 2: 1992 7 1 8
# 3: 1992 8 1 8
# 4: 1992 9 1 8
# 5: 1992 10 1 8
# 6: 1992 11 1 8
# 7: 1992 12 1 8
# 8: 1995 6 0 10
# 9: 1995 7 0 11
#10: 1995 8 0 11
#11: 1995 9 1 11
#12: 1995 10 1 11
#13: 1995 11 1 11
#14: 1995 12 1 11
#15: 1998 6 0 13
#16: 1998 7 0 13
#17: 1998 8 0 13
#18: 1998 9 0 13
#19: 1998 10 0 13
#20: 1998 11 0 13
#21: 1998 12 0 13
# year month flag group
在dplyr
中应该是
library(dplyr)
dt %>%
group_by(group) %>%
mutate(flag = +(row_number() >= which.max(flag == 1) & any(flag == 1)))
,在基数R中使用ave
将是
dt$flag <- with(dt, +(ave(flag == 1, group, FUN = function(x)
seq_along(x) >= which.max(x) & any(x))))
数据
dt <- structure(list(year = c(1992, 1992, 1992, 1992, 1992, 1992, 1992,
1992, 1992, 1992, 1992, 1992, 1995, 1995, 1995, 1995, 1995, 1995,
1995, 1995, 1995, 1995, 1995, 1995, 1998, 1998, 1998, 1998, 1998,
1998, 1998, 1998, 1998, 1998, 1998, 1998), month = c(1, 2, 3,
4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10,
11, 12, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12), flag = c(0, 0,
0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1,
1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), group = c(8L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 10L, 10L, 10L, 10L, 10L,
10L, 11L, 11L, 11L, 11L, 11L, 11L, 13L, 13L, 13L, 13L, 13L, 13L,
13L, 13L, 13L, 13L, 13L, 13L)), row.names = c(NA, -36L), class =
c("data.table","data.frame"))
答案 1 :(得分:1)
您可以在每个组的第一个月进行非股权加入:
DT[unique(DT[flag==1], by="group"), on=.(group, month >= month), flag := 1]
这是来自OP的dput的结果:
year month flag group
1: 1992 1 0 8
2: 1992 2 0 8
3: 1992 3 0 8
4: 1992 4 0 8
5: 1992 5 0 8
6: 1992 6 1 8
7: 1992 7 1 8
8: 1992 8 1 8
9: 1992 9 1 8
10: 1992 10 1 8
11: 1992 11 1 8
12: 1992 12 1 8
13: 1995 1 0 10
14: 1995 2 0 10
15: 1995 3 0 10
16: 1995 4 0 10
17: 1995 5 0 10
18: 1995 6 0 10
19: 1995 7 0 11
20: 1995 8 0 11
21: 1995 9 1 11
22: 1995 10 1 11
23: 1995 11 1 11
24: 1995 12 1 11
25: 1998 1 0 13
26: 1998 2 0 13
27: 1998 3 0 13
28: 1998 4 0 13
29: 1998 5 0 13
30: 1998 6 0 13
31: 1998 7 0 13
32: 1998 8 0 13
33: 1998 9 0 13
34: 1998 10 0 13
35: 1998 11 0 13
36: 1998 12 0 13
year month flag group
答案 2 :(得分:0)
您可以使用dplyr
和cumsum
:
library(dplyr)
df %>%
group_by(group) %>%
mutate(flag = ifelse(cumsum(flag) > 1, 1, 0))
另一种方法是使用lag
:
df %>%
group_by(group) %>%
mutate(flag = ifelse(flag != 1 & row_number() > 1, lag(flag, 1), flag))
或在data.table
中为:
df[, flag := ifelse(cumsum(flag) > 1, 1, 0), by=group]
答案 3 :(得分:0)
使用na.locf()
包中的zoo
第1步:过滤包含至少一个“ 1”的组,并用NA替换其中的“ 0”
第2步:使用na.locf()
将最新的非NA值拖到下面的所有内容
library(zoo)
library(data.table)
temp[group %in% temp[,max(flag),.(group)][V1==1]$group & flag == 0,flag:= NA][,flag:=na.locf(flag,na.rm = FALSE)]
输入表(温度)
structure(list(year = c(1992, 1992, 1992, 1992, 1992, 1992, 1992,
1995, 1995, 1995, 1995, 1995, 1995, 1995, 1998, 1998, 1998, 1998,
1998, 1998, 1998), month = c(6, 7, 8, 9, 10, 11, 12, 6, 7, 8,
9, 10, 11, 12, 6, 7, 8, 9, 10, 11, 12), flag = c(1, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0), group = c(8L,
8L, 8L, 8L, 8L, 8L, 8L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 13L,
13L, 13L, 13L, 13L, 13L, 13L)), row.names = c(NA, -21L), class = c("data.table",
"data.frame"))
输出表
structure(list(year = c(1992, 1992, 1992, 1992, 1992, 1992, 1992,
1995, 1995, 1995, 1995, 1995, 1995, 1995, 1998, 1998, 1998, 1998,
1998, 1998, 1998), month = c(6, 7, 8, 9, 10, 11, 12, 6, 7, 8,
9, 10, 11, 12, 6, 7, 8, 9, 10, 11, 12), flag = c(1, 1, 1, 1,
1, 1, 1, 0, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0), group = c(8L,
8L, 8L, 8L, 8L, 8L, 8L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 13L,
13L, 13L, 13L, 13L, 13L, 13L)), row.names = c(NA, -21L), class = c("data.table",
"data.frame"))