Question

您好，我有3栏：家庭指数，每个家庭的成员人数，每个人的出行次数和出行位置。我希望每个家庭中每个人的第一次旅行的地点都在家里。这是一个例子：

  Household  person  trip     location
      1         1     1          home
      1         1     2          work
      1         1     3          home
      1         2     1          other
      1         2     2          home
      1         2     3          work
      2         1     1          school
      2         1     2          home
      2         1     3          shopping
      2         1     4          home

第二个人在第一家庭中的第一次旅行是其他旅行，因此我想删除此行，并且我也希望旅行列更改并从1开始。第二个家庭有一个成员，第一次旅行是学校，所以我也要删除此行并更改旅行列，所以我希望输出为：

  Household  person  trip     location
      1         1     1          home
      1         1     2          work
      1         1     3          home
      1         2     1          home
      1         2     2          work
      2         1     1          home
      2         1     2          shopping
      2         1     3          home

Answer 1

使用dplyr的一种方法是从值group_by到行Household person和slice和"home"行组。然后，我们可以使用row_number向每个组添加新的旅行号码。假设每个组至少有一个"home"值。

library(dplyr)

df %>%
  group_by(Household, person) %>%
  slice(which.max(location == "home") : n()) %>%
  mutate(trip = row_number())

#  Household person  trip location
#      <int>  <int> <int> <fct>   
#1         1      1     1 home    
#2         1      1     2 work    
#3         1      1     3 home    
#4         1      2     1 home    
#5         1      2     2 work    
#6         2      1     1 home    
#7         2      1     2 shopping
#8         2      1     3 home

Answer 2

我们可以使用data.table方法。将'data.frame'转换为'data.table'（setDT(df)），并按'Household'，'person'分组，获得逻辑表达式的累积总和并将data.table（.SD ）

library(data.table)
setDT(df)[, .SD[cumsum(location == "home")> 0], .(Household, person)
         ][, trip := rowid(Household, person)]
#  Household person trip location
#1:         1      1    1     home
#2:         1      1    2     work
#3:         1      1    3     home
#4:         1      2    1     home
#5:         1      2    2     work
#6:         2      1    1     home
#7:         2      1    2 shopping
#8:         2      1    3     home

与tidyverse

相同

library(dplyr)
df %>%
    group_by(Household, person) %>% 
    filter(cumsum(location == "home") > 0) %>%
    mutate(trip = row_number())
# A tibble: 8 x 4
# Groups:   Household, person [3]
#  Household person  trip location
#      <int>  <int> <int> <chr>   
#1         1      1     1 home    
#2         1      1     2 work    
#3         1      1     3 home    
#4         1      2     1 home    
#5         1      2     2 work    
#6         2      1     1 home    
#7         2      1     2 shopping
#8         2      1     3 home

如果我们想取消last旅行，如果不是“家”

df %>%
    group_by(Household, person) %>%
    filter(row_number() != n()| last(location) == "home") 
# A tibble: 9 x 4
# Groups:   Household, person [3]
#  Household person  trip location
#      <int>  <int> <int> <chr>   
#1         1      1     1 home    
#2         1      1     2 work    
#3         1      1     3 home    
#4         1      2     1 other   
#5         1      2     2 home    
#6         2      1     1 school  
#7         2      1     2 home    
#8         2      1     3 shopping
#9         2      1     4 home

数据

df <- structure(list(Household = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
2L, 2L), person = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L), 
    trip = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 4L), location = c("home", 
    "work", "home", "other", "home", "work", "school", "home", 
    "shopping", "home")), class = "data.frame", row.names = c(NA, 
-10L))

如何删除组中某些元素的第一行？

2 个答案:

数据