如何相对于另一列和一组更改列

时间:2019-09-18 07:18:58

标签: r dataframe

我有2列

 PERNO      TPURP       loop
 1      Loop trip     1
 1      Loop trip     2
 1      home          2
 1      shopping      2
 2      work          1
 2      Loop trip     2
 2      school        2
 3      Looptrip      1
 4      work          1

对于每个perno,如果TPURP ==循环行程,我想在该行之后加1以循环。

对于每个PERNO,如果循环行程恰好在另一循环行程的下一行中,则我们不会在第一个循环中添加1,但在第二个循环中添加。

输出

 PERNO      TPURP       loop
 1      Loop trip     1
 1      Loop trip     2
 1      home          3
 1      shopping      3
 2      work          1
 2      Loop trip     2
 2      school        3
 3      Looptrip      1
 4      work          1

数据

structure(list(PERNO = c(1, 1, 1, 1, 1, 1), TPURP = structure(c(8L, 
1L, 22L, 22L, 9L, 2L), .Label = c("(1) Working at home (for pay)", 
"(2) All other home activities", "(3) Work/Job", "(4) All other activities at work", 
"(5) Attending class", "(6) All other activities at school", 
"(7) Change type of transportation/transfer", "(8) Dropped off passenger", 
"(9) Picked up passenger", "(10) Other, specify - transportation", 
"(11) Work/Business related", "(12) Service Private Vehicle", 
"(13) Routine Shopping", "(14) Shopping for major purchases", 
"(15) Household errands", "(16) Personal Business", "(17) Eat meal outside of home", 
"(18) Health care", "(19) Civic/Religious activities", "(20) Recreation/Entertainment", 
"(21) Visit friends/relative", "(24) Loop trip", "(97) Other, specify"
), class = "factor"), loop = c(1, 1, 2, 2, 2, 2)), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -6L))

1 个答案:

答案 0 :(得分:0)

使用dplyr,我们可以group_by PERNO并在组中最后一次出现loop之后增加"Loop trip"的值。

library(dplyr)

df %>%
  group_by(PERNO) %>%
  mutate(loop1 = ifelse(any(TPURP == "Loop trip") & 
            row_number() > max(which(TPURP == "Loop trip")),loop + 1, loop))

# PERNO TPURP      loop loop1
#  <int> <fct>     <int> <dbl>
#1     1 Loop trip     1     1
#2     1 Loop trip     2     2
#3     1 home          2     3
#4     1 shopping      2     3
#5     2 work          1     1
#6     2 Loop trip     2     2
#7     2 school        2     3
#8     3 Looptrip      1     1
#9     4 work          1     1

如果对于任何组都没有"Loop trip",但可以忽略它,则返回警告消息。

数据

df <- structure(list(PERNO = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 4L), 
TPURP = structure(c(2L, 2L, 1L, 5L, 6L, 2L, 4L, 3L, 6L), .Label = c("home", 
"Loop trip", "Looptrip", "school", "shopping", "work"), class = "factor"), 
loop = c(1L, 2L, 2L, 2L, 1L, 2L, 2L, 1L, 1L)), class = "data.frame", 
row.names = c(NA, -9L))

或者我们可以使用grepl / grep进行部分匹配,而不是使用@Sotos提到的完全匹配。在更新的数据集上我们可以做到

df %>% 
  group_by(PERNO) %>%
  dplyr::mutate(loop1 = ifelse(any(grepl('Loop', TPURP)) & 
     row_number() > max(grep('Loop', TPURP)), loop + 1, loop))

#   PERNO TPURP                          loop loop1
#   <dbl> <fct>                         <dbl> <dbl>
#1     1 (8) Dropped off passenger         1     1
#2     1 (1) Working at home (for pay)     1     1
#3     1 (24) Loop trip                    2     2
#4     1 (24) Loop trip                    2     2
#5     1 (9) Picked up passenger           2     3
#6     1 (2) All other home activities     2     3