我有2列
PERNO TPURP loop
1 Loop trip 1
1 Loop trip 2
1 home 2
1 shopping 2
2 work 1
2 Loop trip 2
2 school 2
3 Looptrip 1
4 work 1
对于每个perno,如果TPURP ==循环行程,我想在该行之后加1以循环。
对于每个PERNO,如果循环行程恰好在另一循环行程的下一行中,则我们不会在第一个循环中添加1,但在第二个循环中添加。
输出
PERNO TPURP loop
1 Loop trip 1
1 Loop trip 2
1 home 3
1 shopping 3
2 work 1
2 Loop trip 2
2 school 3
3 Looptrip 1
4 work 1
数据
structure(list(PERNO = c(1, 1, 1, 1, 1, 1), TPURP = structure(c(8L,
1L, 22L, 22L, 9L, 2L), .Label = c("(1) Working at home (for pay)",
"(2) All other home activities", "(3) Work/Job", "(4) All other activities at work",
"(5) Attending class", "(6) All other activities at school",
"(7) Change type of transportation/transfer", "(8) Dropped off passenger",
"(9) Picked up passenger", "(10) Other, specify - transportation",
"(11) Work/Business related", "(12) Service Private Vehicle",
"(13) Routine Shopping", "(14) Shopping for major purchases",
"(15) Household errands", "(16) Personal Business", "(17) Eat meal outside of home",
"(18) Health care", "(19) Civic/Religious activities", "(20) Recreation/Entertainment",
"(21) Visit friends/relative", "(24) Loop trip", "(97) Other, specify"
), class = "factor"), loop = c(1, 1, 2, 2, 2, 2)), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -6L))
答案 0 :(得分:0)
使用dplyr
,我们可以group_by
PERNO
并在组中最后一次出现loop
之后增加"Loop trip"
的值。
library(dplyr)
df %>%
group_by(PERNO) %>%
mutate(loop1 = ifelse(any(TPURP == "Loop trip") &
row_number() > max(which(TPURP == "Loop trip")),loop + 1, loop))
# PERNO TPURP loop loop1
# <int> <fct> <int> <dbl>
#1 1 Loop trip 1 1
#2 1 Loop trip 2 2
#3 1 home 2 3
#4 1 shopping 2 3
#5 2 work 1 1
#6 2 Loop trip 2 2
#7 2 school 2 3
#8 3 Looptrip 1 1
#9 4 work 1 1
如果对于任何组都没有"Loop trip"
,但可以忽略它,则返回警告消息。
数据
df <- structure(list(PERNO = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L),
TPURP = structure(c(2L, 2L, 1L, 5L, 6L, 2L, 4L, 3L, 6L), .Label = c("home",
"Loop trip", "Looptrip", "school", "shopping", "work"), class = "factor"),
loop = c(1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L)), class = "data.frame",
row.names = c(NA, -9L))
或者我们可以使用grepl
/ grep
进行部分匹配,而不是使用@Sotos提到的完全匹配。在更新的数据集上我们可以做到
df %>%
group_by(PERNO) %>%
dplyr::mutate(loop1 = ifelse(any(grepl('Loop', TPURP)) &
row_number() > max(grep('Loop', TPURP)), loop + 1, loop))
# PERNO TPURP loop loop1
# <dbl> <fct> <dbl> <dbl>
#1 1 (8) Dropped off passenger 1 1
#2 1 (1) Working at home (for pay) 1 1
#3 1 (24) Loop trip 2 2
#4 1 (24) Loop trip 2 2
#5 1 (9) Picked up passenger 2 3
#6 1 (2) All other home activities 2 3