我有一个包含3523个观测值和92个变量的数据框。
下面是一个数据帧为6的示例; 24小时的观测记录从4:00 am开始,到4:00 am结束。
04:00 04:15 04:30 05:00 ... 04:35
1 - - - - ... -
2 2 2 2 - ... -
3 2 - - 2 ... -
4 - - 2 - ... -
5 - - - - ... -
6 - - - - ... 2
每行包含值“-”和“ 2”。
我想提取以“ 2”开头的间隔的开始和结束
For example 2: 04:15-04:30;
3: 04:00 ; 05:00
4: 04:30
谢谢
答案 0 :(得分:1)
让我们扩展您的示例。在展开的示例中,我们可以注意到第1行没有2
,并且还有一些比较棘手的问题,例如第6行中有2
,然后有一个破折号( -
),然后是两个2
,一个-
和一个2
的序列。
04:00 04:15 04:30 05:00 05:15 05:30
1: - - - - - -
2: 2 2 2 - 2 2
3: 2 - - 2 2 2
4: - - 2 - 2 2
5: - - - - 2 2
6: 2 - 2 2 - 2
7: - - - - 2 2
8: 2 2 - 2 2 2
9: - - - - 2 2
10: 2 2 - 2 2 2
如果键入以下内容,则可以复制它:
WorkSchedulesDay1 <- structure(list(`04:00` = c("-", "2", "2", "-", "-", "2", "-",
"2", "-", "2"), `04:15` = c("-", "2", "-", "-", "-", "-", "-",
"2", "-", "2"), `04:30` = c("-", "2", "-", "2", "-", "2", "-",
"-", "-", "-"), `05:00` = c("-", "-", "2", "-", "-", "2", "-",
"2", "-", "2"), `05:15` = c("-", "2", "2", "2", "2", "-", "2",
"2", "2", "2"), `05:30` = c("-", "2", "2", "2", "2", "2", "2",
"2", "2", "2")), row.names = c(NA, -10L), class = c("data.table",
"data.frame"))
之后,您应用代码:
WorkSchedulesDay1 <- WorkSchedulesDay1 %>%
group_by(rn = row_number()) %>%
gather(time, val, 1:6) %>%
arrange(time) %>%
mutate(tmp = cumsum(coalesce(val != lag(val), FALSE))) %>% arrange(rn) %>%
filter(!val == "-") %>%
group_by(rn, tmp) %>%
mutate(
time = case_when(
n() > 1 ~ paste(min(time), max(time), sep = " - "),
TRUE ~ time
)
) %>%
ungroup() %>% distinct(rn, tmp, time) %>%
group_by(rn) %>%
mutate(
intervals = case_when(
n() > 1 ~ paste(time, collapse = ", "),
TRUE ~ time
)
) %>% distinct(rn, intervals) %>%
write_csv("WorkSchedulesDay1.csv")
您将看到获得的是:
rn intervals
<int> <chr>
2 04:00 - 04:30, 05:15 - 05:30
3 04:00, 05:00 - 05:30
4 04:30, 05:15 - 05:30
5 05:15 - 05:30
6 04:00, 04:30 - 05:00, 05:30
7 05:15 - 05:30
8 04:00 - 04:15, 05:00 - 05:30
9 05:15 - 05:30
10 04:00 - 04:15, 05:00 - 05:30
第1行没有记录,仅仅是因为那里只有-
。
类似地,第2行中没有05:00
的记录,仅仅是因为其中有一个-
。
以类似的方式,第6行有04:00, 04:30 - 05:00, 05:30
,因为-
和04:15
有05:15
。