如何用2组和1列定义一个新列

时间:2019-10-18 05:08:58

标签: r dataframe

我有3列:

household   persons   activity
  1       1        home
  1       1         shopping
  1       1        home
  1       1         eating
  1       1         work
  1       1        shopping
  1       1         home
  1       2         home
  1       2          shopping
  1       2         home
  2       1         home
  2       1         eating
  2       1         home

第一列是家庭指数,第二列是家庭成员。每个人的每项活动都在家里开始。敌人,我想捍卫每个家庭的每个人,一个列循环,当活动是在家或工作后的活动时,它从1开始并更改为循环+1。例如,在以下数据中,第三行是home,因此对于第4行,我们有loop = 2,而第5行是工作,因此我们下班后就有loop = 3

输出

household   persons   activity      loop
  1       1        home               1
  1       1         shopping          1 
  1       1        home               1
  1       1         eating            2
  1       1         work              2
  1       1        shopping           3
  1       1         home              3
  1       2         home              1 
  1       2          shopping         1
  1       2         home              1
  2       1         home              1
  2       1         eating            1
  2       1         home              1

2 个答案:

答案 0 :(得分:1)

这是一个主意。我们可以使用rleidfilllead函数来创建loop

dat2 <- dat %>%
  mutate(activity2 = replace(activity, !activity %in% c("home", "work"), NA)) %>%
  group_by(household, persons) %>%
  fill(activity2) %>%
  mutate(loop = lead(rleid(activity2))) %>%
  fill(loop) %>%
  ungroup() %>%
  select(-activity2)
dat2  
# # A tibble: 13 x 4
#    household persons activity  loop
#        <int>   <int> <chr>    <int>
#  1         1       1 home         1
#  2         1       1 shopping     1
#  3         1       1 home         1
#  4         1       1 eating       2
#  5         1       1 work         2
#  6         1       1 shopping     3
#  7         1       1 home         3
#  8         1       2 home         1
#  9         1       2 shopping     1
# 10         1       2 home         1
# 11         2       1 home         1
# 12         2       1 eating       1
# 13         2       1 home         1

数据

dat <- read.table(text = "household   persons   activity
  1       1        home
  1       1         shopping
  1       1        home
  1       1         eating
  1       1         work
  1       1        shopping
  1       1         home
  1       2         home
  1       2          shopping
  1       2         home
  2       1         home
  2       1         eating
  2       1         home",
                  stringsAsFactors = FALSE, header = TRUE)

答案 1 :(得分:1)

另一个使用的选项,假设第一个活动总是在家或在办公室:

DT[, loop := shift(cumsum(activity %chin% c('home','work')), fill=1L), 
    .(household, persons)]

输出:

    household persons activity loop
 1:         1       1     home    1
 2:         1       1 shopping    1
 3:         1       1     home    1
 4:         1       1   eating    2
 5:         1       1     work    2
 6:         1       1 shopping    3
 7:         1       1     home    3
 8:         1       2     home    1
 9:         1       2 shopping    1
10:         1       2     home    1
11:         2       1     home    1
12:         2       1   eating    1
13:         2       1     home    1

数据:

library(data.table)
DT <- fread("household   persons   activity
1       1        home
1       1         shopping
1       1        home
1       1         eating
1       1         work
1       1        shopping
1       1         home
1       2         home
1       2          shopping
1       2         home
2       1         home
2       1         eating
2       1         home")