R - 根据人物期间格式的条件填充值

时间:2015-08-23 13:56:48

标签: r

我正在努力找到一种基于两个简单条件填充值的简单方法。

我想在每个working的第一个和最后一个“1”之后用1填充变量dayweek。这个例子更具说服力。

    id hours dayweek working
1   1     1  Friday       0
2   1     2  Friday       0
3   1     3  Friday       0
4   1     4  Friday       0
5   1     5  Friday       0
6   1     6  Friday       0
7   1     7  Friday       0
8   1     8  Friday       1
9   1     9  Friday       0
10  1    10  Friday       0
11  1    11  Friday       0
12  1    12  Friday       0
13  1    13  Friday       0
14  1    14  Friday       0
15  1    15  Friday       0
16  1    16  Friday       0
17  1    17  Friday       1
18  1    18  Friday       0
19  1    19  Friday       0
20  1    20  Friday       0

我正在尝试这样做。

    id hours dayweek working
1   1     1  Friday       0
2   1     2  Friday       0
3   1     3  Friday       0
4   1     4  Friday       0
5   1     5  Friday       0
6   1     6  Friday       0
7   1     7  Friday       0
8   1     8  Friday       1
9   1     9  Friday       1
10  1    10  Friday       1
11  1    11  Friday       1
12  1    12  Friday       1
13  1    13  Friday       1
14  1    14  Friday       1
15  1    15  Friday       1
16  1    16  Friday       1
17  1    17  Friday       1
18  1    18  Friday       0
19  1    19  Friday       0
20  1    20  Friday       0

group_by必须是iddayweek

有任何线索吗?

数据

structure(list(id = structure(c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), .Label = c("1", 
"2", "3"), class = "factor"), hours = 1:20, dayweek = structure(c(1L, 
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 1L, 1L), .Label = c("Friday", "Monday", "Saturday", "Sunday", 
"Thursday", "Tuesday", "Wedesnday"), class = "factor"), working = c(0, 
0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0)), row.names = c(NA, 
20L), class = "data.frame", .Names = c("id", "hours", "dayweek", 
"working"))

同一问题的替代数据

dt = structure(list(X = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 29L, 30L, 
31L, 32L, 33L, 34L, 35L, 36L, 57L, 58L, 59L, 60L, 61L, 62L, 63L, 
64L), id = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L), hours = c(1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 2L, 3L, 4L, 1L, 
2L, 3L, 4L, 1L, 2L, 3L, 4L), dayweek = structure(c(1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 2L), .Label = c("Friday", "Monday", "Saturday", 
"Sunday", "Thursday", "Tuesday", "Wedesnday"), class = "factor"), 
working = c(0L, 1L, 0L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L, 
0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)), class = "data.frame",   row.names = c(NA, 
-24L), .Names = c("X", "id", "hours", "dayweek", "working"))

1 个答案:

答案 0 :(得分:3)

我们可以使用data.table来执行此操作。我们将'data.frame'转换为'data.table'(setDT(df1))。通过'id'和'dayweek'分组,我们得到'working'中元素的数字索引,它等于if上的1('tmp'),组中至少有1个值({{ 1}})。获取第一个(if(any(working==1)))和最后一个(:)位置之间的序列(head(tmp,1))并用tail(tmp, 1)包装它以获得行索引('i1')。使用索引并将与该行对应的“working”元素指定为1.

.I

使用library(data.table) i1 <- setDT(df1)[, if(any(working==1)){tmp <- which(working==1) .I[head(tmp,1):tail(tmp,1)]} , by = .(id, dayweek)]$V1 df1[i1, working:=1L] df1 # id hours dayweek working # 1: 1 1 Friday 0 # 2: 1 2 Friday 0 # 3: 1 3 Friday 0 # 4: 1 4 Friday 0 # 5: 1 5 Friday 0 # 6: 1 6 Friday 0 # 7: 1 7 Friday 0 # 8: 1 8 Friday 1 # 9: 1 9 Friday 1 #10: 1 10 Friday 1 #11: 1 11 Friday 1 #12: 1 12 Friday 1 #13: 1 13 Friday 1 #14: 1 14 Friday 1 #15: 1 15 Friday 1 #16: 1 16 Friday 1 #17: 1 17 Friday 1 #18: 1 18 Friday 0 #19: 1 19 Friday 0 #20: 1 20 Friday 0 (由@David Arenburg建议)的类似解决方案是按“id”,“dayweek”列进行分组,使用dplyrmin获取maxworking == 1replace working中的那些元素的第一个和最后一个位置。如果特定组没有1个值,我们可以使用ifelse换行为这些组返回0。

library(dplyr)
df1 %>%
   group_by(id, dayweek) %>%
   mutate(new = any(working ==1),
      working = ifelse(new, replace(working,
                  min(which(working == 1)):max(which(working == 1)), 1), 
                  as.numeric(new))) %>%
   select(-new)
#Source: local data frame [20 x 4]
#Groups: id, dayweek
#
#   id hours dayweek working
#1   1     1  Friday       0
#2   1     2  Friday       0
#3   1     3  Friday       0
#4   1     4  Friday       0
#5   1     5  Friday       0
#6   1     6  Friday       0
#7   1     7  Friday       0
#8   1     8  Friday       1
#9   1     9  Friday       1
#10  1    10  Friday       1
#11  1    11  Friday       1
#12  1    12  Friday       1
#13  1    13  Friday       1
#14  1    14  Friday       1
#15  1    15  Friday       1
#16  1    16  Friday       1
#17  1    17  Friday       1
#18  1    18  Friday       0
#19  1    19  Friday       0
#20  1    20  Friday       0

或@Khashaa建议的紧凑选项,我们将'working'的cummax乘以'working'的反向(cummax)的rev,以便只有1 vectors中的{1}}保持为1,而其他人将被替换为0.

df1 %>% 
    group_by(id, dayweek) %>%
    mutate(working = cummax(working)*rev(cummax(rev(working))))