我有3列:
household persons activity
1 1 home
1 1 shopping
1 1 home
1 1 eating
1 1 work
1 1 shopping
1 1 home
1 2 home
1 2 shopping
1 2 home
2 1 home
2 1 eating
2 1 home
第一列是家庭指数,第二列是家庭成员。每个人的每项活动都在家里开始。敌人,我想捍卫每个家庭的每个人,一个列循环,当活动是在家或工作后的活动时,它从1开始并更改为循环+1。例如,在以下数据中,第三行是home,因此对于第4行,我们有loop = 2,而第5行是工作,因此我们下班后就有loop = 3
输出
household persons activity loop
1 1 home 1
1 1 shopping 1
1 1 home 1
1 1 eating 2
1 1 work 2
1 1 shopping 3
1 1 home 3
1 2 home 1
1 2 shopping 1
1 2 home 1
2 1 home 1
2 1 eating 1
2 1 home 1
答案 0 :(得分:1)
这是一个主意。我们可以使用rleid
,fill
和lead
函数来创建loop
列
dat2 <- dat %>%
mutate(activity2 = replace(activity, !activity %in% c("home", "work"), NA)) %>%
group_by(household, persons) %>%
fill(activity2) %>%
mutate(loop = lead(rleid(activity2))) %>%
fill(loop) %>%
ungroup() %>%
select(-activity2)
dat2
# # A tibble: 13 x 4
# household persons activity loop
# <int> <int> <chr> <int>
# 1 1 1 home 1
# 2 1 1 shopping 1
# 3 1 1 home 1
# 4 1 1 eating 2
# 5 1 1 work 2
# 6 1 1 shopping 3
# 7 1 1 home 3
# 8 1 2 home 1
# 9 1 2 shopping 1
# 10 1 2 home 1
# 11 2 1 home 1
# 12 2 1 eating 1
# 13 2 1 home 1
数据
dat <- read.table(text = "household persons activity
1 1 home
1 1 shopping
1 1 home
1 1 eating
1 1 work
1 1 shopping
1 1 home
1 2 home
1 2 shopping
1 2 home
2 1 home
2 1 eating
2 1 home",
stringsAsFactors = FALSE, header = TRUE)
答案 1 :(得分:1)
另一个使用data.table的选项,假设第一个活动总是在家或在办公室:
DT[, loop := shift(cumsum(activity %chin% c('home','work')), fill=1L),
.(household, persons)]
输出:
household persons activity loop
1: 1 1 home 1
2: 1 1 shopping 1
3: 1 1 home 1
4: 1 1 eating 2
5: 1 1 work 2
6: 1 1 shopping 3
7: 1 1 home 3
8: 1 2 home 1
9: 1 2 shopping 1
10: 1 2 home 1
11: 2 1 home 1
12: 2 1 eating 1
13: 2 1 home 1
数据:
library(data.table)
DT <- fread("household persons activity
1 1 home
1 1 shopping
1 1 home
1 1 eating
1 1 work
1 1 shopping
1 1 home
1 2 home
1 2 shopping
1 2 home
2 1 home
2 1 eating
2 1 home")