您好,我有3栏:家庭指数,每个家庭的成员人数,每个人的出行次数和出行位置。我希望每个家庭中每个人的第一次旅行的地点都在家里。这是一个例子:
Household person trip location
1 1 1 home
1 1 2 work
1 1 3 home
1 2 1 other
1 2 2 home
1 2 3 work
2 1 1 school
2 1 2 home
2 1 3 shopping
2 1 4 home
第二个人在第一家庭中的第一次旅行是其他旅行,因此我想删除此行,并且我也希望旅行列更改并从1开始。 第二个家庭有一个成员,第一次旅行是学校,所以我也要删除此行并更改旅行列,所以我希望输出为:
Household person trip location
1 1 1 home
1 1 2 work
1 1 3 home
1 2 1 home
1 2 2 work
2 1 1 home
2 1 2 shopping
2 1 3 home
答案 0 :(得分:2)
使用dplyr
的一种方法是从值group_by
到行Household
person
和slice
和"home"
行组。然后,我们可以使用row_number
向每个组添加新的旅行号码。假设每个组至少有一个"home"
值。
library(dplyr)
df %>%
group_by(Household, person) %>%
slice(which.max(location == "home") : n()) %>%
mutate(trip = row_number())
# Household person trip location
# <int> <int> <int> <fct>
#1 1 1 1 home
#2 1 1 2 work
#3 1 1 3 home
#4 1 2 1 home
#5 1 2 2 work
#6 2 1 1 home
#7 2 1 2 shopping
#8 2 1 3 home
答案 1 :(得分:2)
我们可以使用data.table
方法。将'data.frame'转换为'data.table'(setDT(df)
),并按'Household','person'分组,获得逻辑表达式的累积总和并将data.table(.SD
)
library(data.table)
setDT(df)[, .SD[cumsum(location == "home")> 0], .(Household, person)
][, trip := rowid(Household, person)]
# Household person trip location
#1: 1 1 1 home
#2: 1 1 2 work
#3: 1 1 3 home
#4: 1 2 1 home
#5: 1 2 2 work
#6: 2 1 1 home
#7: 2 1 2 shopping
#8: 2 1 3 home
与tidyverse
library(dplyr)
df %>%
group_by(Household, person) %>%
filter(cumsum(location == "home") > 0) %>%
mutate(trip = row_number())
# A tibble: 8 x 4
# Groups: Household, person [3]
# Household person trip location
# <int> <int> <int> <chr>
#1 1 1 1 home
#2 1 1 2 work
#3 1 1 3 home
#4 1 2 1 home
#5 1 2 2 work
#6 2 1 1 home
#7 2 1 2 shopping
#8 2 1 3 home
如果我们想取消last
旅行,如果不是“家”
df %>%
group_by(Household, person) %>%
filter(row_number() != n()| last(location) == "home")
# A tibble: 9 x 4
# Groups: Household, person [3]
# Household person trip location
# <int> <int> <int> <chr>
#1 1 1 1 home
#2 1 1 2 work
#3 1 1 3 home
#4 1 2 1 other
#5 1 2 2 home
#6 2 1 1 school
#7 2 1 2 home
#8 2 1 3 shopping
#9 2 1 4 home
df <- structure(list(Household = c(1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
2L, 2L), person = c(1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L),
trip = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L, 4L), location = c("home",
"work", "home", "other", "home", "work", "school", "home",
"shopping", "home")), class = "data.frame", row.names = c(NA,
-10L))