针对每个ID

时间:2018-11-27 07:53:46

标签: r dplyr

我有一个包含person_ID,Job_ID,Municipality_code和其他变量的数据框(请参见下面的示例数据框)。 Job_ID变量是按月测量的,而Municipality_code是按每年测量的。

 as.data.frame(df)
   Person_ID Month Year Job_ID Municipality_code
1          1     1 2017   Job1                 1
2          1     2 2017   Job1                 1
3          1     3 2017   Job1                 1
4          1     4 2017   Job1                 1
5          1     5 2017   Job2                 1
6          1     6 2017   Job2                 1
7          1     7 2017   Job2                 1
8          1     8 2017   Job2                 1
9          1     9 2017   Job2                 1
10         1    10 2017   Job2                 1
11         1    11 2017   Job2                 1
12         1    12 2017   Job2                 1
13         1     1 2018   Job2                20
14         1     2 2018   Job2                20
15         1     3 2018   Job2                20
16         1     4 2018   Job2                20
17         1     5 2018   Job2                20
18         1     6 2018   Job2                20
19         1     7 2018   Job2                20
20         1     8 2018   Job2                20
21         1     9 2018   Job2                20
22         1    10 2018   Job2                20
23         1    11 2018   Job2                20
24         1    12 2018   Job2                20

我想根据每个人的Job_ID来更正城市代码。例如:我们注意到Person_ID 1在2017年第五个月(Job1-> Job2)切换作业。由于Municipality_code的属性,代码将保持为1(因为在1-2017,我们拥有Job1和相应的Municipality_code 1)。我需要一段纠正Municipality_code的代码(因此,从5/2017开始,我们需要Municipality_code 20而不是1)。我尝试了以下代码,但徒劳无功。

df2 <- df %>% 
  group_by(Person_ID) %>%
  dplyr::mutate(lag = lag(Job_ID, default = NA, order_by = Job_ID), 
                Municipality_corrected = if_else(Job_ID == lag, Municipality_code[1], Municipality_code[2]))

以及所需的输出...

Person_ID Month Year Job_ID Municipality_code  lag Municipality_corrected
1          1     1 2017   Job1                 1 <NA>                     NA
2          1     2 2017   Job1                 1 Job1                      1
3          1     3 2017   Job1                 1 Job1                      1
4          1     4 2017   Job1                 1 Job1                      1
5          1     5 2017   Job2                 1 Job1                      1
6          1     6 2017   Job2                 1 Job2                      20
7          1     7 2017   Job2                 1 Job2                      20
8          1     8 2017   Job2                 1 Job2                      20
9          1     9 2017   Job2                 1 Job2                      20
10         1    10 2017   Job2                 1 Job2                      20
11         1    11 2017   Job2                 1 Job2                      20
12         1    12 2017   Job2                 1 Job2                      20
13         1     1 2018   Job2                 20 Job2                     20
14         1     2 2018   Job2                 20 Job2                     20
15         1     3 2018   Job2                 20 Job2                     20
16         1     4 2018   Job2                 20 Job2                     20
17         1     5 2018   Job2                 20 Job2                     20
18         1     6 2018   Job2                 20 Job2                     20
19         1     7 2018   Job2                 20 Job2                     20
20         1     8 2018   Job2                 20 Job2                     20
21         1     9 2018   Job2                 20 Job2                     20
22         1    10 2018   Job2                 20 Job2                     20
23         1    11 2018   Job2                 20 Job2                     20
24         1    12 2018   Job2                 20 Job2                     20

1 个答案:

答案 0 :(得分:1)

以下内容为您提供了更正的Municipality_code

df %>% 
  group_by(Person_ID, Job_ID) %>% 
  mutate(Municipality_corrected = last(Municipality_code))

# A tibble: 24 x 6
# Groups:   Person_ID, Job_ID [2]
#    Person_ID Month  Year Job_ID Municipality_code Municipality_corrected
#        <int> <int> <int> <chr>              <int>                  <int>
#  1         1     1  2017 Job1                   1                      1
#  2         1     2  2017 Job1                   1                      1
#  3         1     3  2017 Job1                   1                      1
#  4         1     4  2017 Job1                   1                      1
#  5         1     5  2017 Job2                   1                     20
#  6         1     6  2017 Job2                   1                     20
#  7         1     7  2017 Job2                   1                     20
#  8         1     8  2017 Job2                   1                     20
#  9         1     9  2017 Job2                   1                     20
# 10         1    10  2017 Job2                   1                     20
# ... with 14 more rows

我使用的想法是,每个工作的城市代码都相同,因此按Job_ID分组。然后,我将每个Municipality_code的最后一个Job_ID作为更正后的一个。