我正在努力找到一种基于两个简单条件填充值的简单方法。
我想在每个working
的第一个和最后一个“1
”之后用1
填充变量dayweek
。这个例子更具说服力。
id hours dayweek working
1 1 1 Friday 0
2 1 2 Friday 0
3 1 3 Friday 0
4 1 4 Friday 0
5 1 5 Friday 0
6 1 6 Friday 0
7 1 7 Friday 0
8 1 8 Friday 1
9 1 9 Friday 0
10 1 10 Friday 0
11 1 11 Friday 0
12 1 12 Friday 0
13 1 13 Friday 0
14 1 14 Friday 0
15 1 15 Friday 0
16 1 16 Friday 0
17 1 17 Friday 1
18 1 18 Friday 0
19 1 19 Friday 0
20 1 20 Friday 0
我正在尝试这样做。
id hours dayweek working
1 1 1 Friday 0
2 1 2 Friday 0
3 1 3 Friday 0
4 1 4 Friday 0
5 1 5 Friday 0
6 1 6 Friday 0
7 1 7 Friday 0
8 1 8 Friday 1
9 1 9 Friday 1
10 1 10 Friday 1
11 1 11 Friday 1
12 1 12 Friday 1
13 1 13 Friday 1
14 1 14 Friday 1
15 1 15 Friday 1
16 1 16 Friday 1
17 1 17 Friday 1
18 1 18 Friday 0
19 1 19 Friday 0
20 1 20 Friday 0
group_by
必须是id
和dayweek
。
有任何线索吗?
数据
structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1",
"2", "3"), class = "factor"), hours = 1:20, dayweek = structure(c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("Friday", "Monday", "Saturday", "Sunday",
"Thursday", "Tuesday", "Wedesnday"), class = "factor"), working = c(0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0)), row.names = c(NA,
20L), class = "data.frame", .Names = c("id", "hours", "dayweek",
"working"))
同一问题的替代数据
dt = structure(list(X = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 29L, 30L,
31L, 32L, 33L, 34L, 35L, 36L, 57L, 58L, 59L, 60L, 61L, 62L, 63L,
64L), id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), hours = c(1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L,
2L, 3L, 4L, 1L, 2L, 3L, 4L), dayweek = structure(c(1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 2L), .Label = c("Friday", "Monday", "Saturday",
"Sunday", "Thursday", "Tuesday", "Wedesnday"), class = "factor"),
working = c(0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L,
0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame", row.names = c(NA,
-24L), .Names = c("X", "id", "hours", "dayweek", "working"))
答案 0 :(得分:3)
我们可以使用data.table
来执行此操作。我们将'data.frame'转换为'data.table'(setDT(df1)
)。通过'id'和'dayweek'分组,我们得到'working'中元素的数字索引,它等于if
上的1('tmp'),组中至少有1个值({{ 1}})。获取第一个(if(any(working==1))
)和最后一个(:
)位置之间的序列(head(tmp,1)
)并用tail(tmp, 1)
包装它以获得行索引('i1')。使用索引并将与该行对应的“working”元素指定为1.
.I
使用library(data.table)
i1 <- setDT(df1)[, if(any(working==1)){tmp <- which(working==1)
.I[head(tmp,1):tail(tmp,1)]} , by = .(id, dayweek)]$V1
df1[i1, working:=1L]
df1
# id hours dayweek working
# 1: 1 1 Friday 0
# 2: 1 2 Friday 0
# 3: 1 3 Friday 0
# 4: 1 4 Friday 0
# 5: 1 5 Friday 0
# 6: 1 6 Friday 0
# 7: 1 7 Friday 0
# 8: 1 8 Friday 1
# 9: 1 9 Friday 1
#10: 1 10 Friday 1
#11: 1 11 Friday 1
#12: 1 12 Friday 1
#13: 1 13 Friday 1
#14: 1 14 Friday 1
#15: 1 15 Friday 1
#16: 1 16 Friday 1
#17: 1 17 Friday 1
#18: 1 18 Friday 0
#19: 1 19 Friday 0
#20: 1 20 Friday 0
(由@David Arenburg建议)的类似解决方案是按“id”,“dayweek”列进行分组,使用dplyr
和min
获取max
中working == 1
和replace
working
中的那些元素的第一个和最后一个位置。如果特定组没有1个值,我们可以使用ifelse
换行为这些组返回0。
library(dplyr)
df1 %>%
group_by(id, dayweek) %>%
mutate(new = any(working ==1),
working = ifelse(new, replace(working,
min(which(working == 1)):max(which(working == 1)), 1),
as.numeric(new))) %>%
select(-new)
#Source: local data frame [20 x 4]
#Groups: id, dayweek
#
# id hours dayweek working
#1 1 1 Friday 0
#2 1 2 Friday 0
#3 1 3 Friday 0
#4 1 4 Friday 0
#5 1 5 Friday 0
#6 1 6 Friday 0
#7 1 7 Friday 0
#8 1 8 Friday 1
#9 1 9 Friday 1
#10 1 10 Friday 1
#11 1 11 Friday 1
#12 1 12 Friday 1
#13 1 13 Friday 1
#14 1 14 Friday 1
#15 1 15 Friday 1
#16 1 16 Friday 1
#17 1 17 Friday 1
#18 1 18 Friday 0
#19 1 19 Friday 0
#20 1 20 Friday 0
或@Khashaa建议的紧凑选项,我们将'working'的cummax
乘以'working'的反向(cummax
)的rev
,以便只有1
vectors
中的{1}}保持为1,而其他人将被替换为0.
df1 %>%
group_by(id, dayweek) %>%
mutate(working = cummax(working)*rev(cummax(rev(working))))