我有一个包含person_ID,Job_ID,Municipality_code和其他变量的数据框(请参见下面的示例数据框)。 Job_ID变量是按月测量的,而Municipality_code是按每年测量的。
as.data.frame(df)
Person_ID Month Year Job_ID Municipality_code
1 1 1 2017 Job1 1
2 1 2 2017 Job1 1
3 1 3 2017 Job1 1
4 1 4 2017 Job1 1
5 1 5 2017 Job2 1
6 1 6 2017 Job2 1
7 1 7 2017 Job2 1
8 1 8 2017 Job2 1
9 1 9 2017 Job2 1
10 1 10 2017 Job2 1
11 1 11 2017 Job2 1
12 1 12 2017 Job2 1
13 1 1 2018 Job2 20
14 1 2 2018 Job2 20
15 1 3 2018 Job2 20
16 1 4 2018 Job2 20
17 1 5 2018 Job2 20
18 1 6 2018 Job2 20
19 1 7 2018 Job2 20
20 1 8 2018 Job2 20
21 1 9 2018 Job2 20
22 1 10 2018 Job2 20
23 1 11 2018 Job2 20
24 1 12 2018 Job2 20
我想根据每个人的Job_ID来更正城市代码。例如:我们注意到Person_ID 1在2017年第五个月(Job1-> Job2)切换作业。由于Municipality_code
的属性,代码将保持为1(因为在1-2017,我们拥有Job1和相应的Municipality_code
1)。我需要一段纠正Municipality_code
的代码(因此,从5/2017开始,我们需要Municipality_code
20而不是1)。我尝试了以下代码,但徒劳无功。
df2 <- df %>%
group_by(Person_ID) %>%
dplyr::mutate(lag = lag(Job_ID, default = NA, order_by = Job_ID),
Municipality_corrected = if_else(Job_ID == lag, Municipality_code[1], Municipality_code[2]))
以及所需的输出...
Person_ID Month Year Job_ID Municipality_code lag Municipality_corrected
1 1 1 2017 Job1 1 <NA> NA
2 1 2 2017 Job1 1 Job1 1
3 1 3 2017 Job1 1 Job1 1
4 1 4 2017 Job1 1 Job1 1
5 1 5 2017 Job2 1 Job1 1
6 1 6 2017 Job2 1 Job2 20
7 1 7 2017 Job2 1 Job2 20
8 1 8 2017 Job2 1 Job2 20
9 1 9 2017 Job2 1 Job2 20
10 1 10 2017 Job2 1 Job2 20
11 1 11 2017 Job2 1 Job2 20
12 1 12 2017 Job2 1 Job2 20
13 1 1 2018 Job2 20 Job2 20
14 1 2 2018 Job2 20 Job2 20
15 1 3 2018 Job2 20 Job2 20
16 1 4 2018 Job2 20 Job2 20
17 1 5 2018 Job2 20 Job2 20
18 1 6 2018 Job2 20 Job2 20
19 1 7 2018 Job2 20 Job2 20
20 1 8 2018 Job2 20 Job2 20
21 1 9 2018 Job2 20 Job2 20
22 1 10 2018 Job2 20 Job2 20
23 1 11 2018 Job2 20 Job2 20
24 1 12 2018 Job2 20 Job2 20
答案 0 :(得分:1)
以下内容为您提供了更正的Municipality_code
df %>%
group_by(Person_ID, Job_ID) %>%
mutate(Municipality_corrected = last(Municipality_code))
# A tibble: 24 x 6
# Groups: Person_ID, Job_ID [2]
# Person_ID Month Year Job_ID Municipality_code Municipality_corrected
# <int> <int> <int> <chr> <int> <int>
# 1 1 1 2017 Job1 1 1
# 2 1 2 2017 Job1 1 1
# 3 1 3 2017 Job1 1 1
# 4 1 4 2017 Job1 1 1
# 5 1 5 2017 Job2 1 20
# 6 1 6 2017 Job2 1 20
# 7 1 7 2017 Job2 1 20
# 8 1 8 2017 Job2 1 20
# 9 1 9 2017 Job2 1 20
# 10 1 10 2017 Job2 1 20
# ... with 14 more rows
我使用的想法是,每个工作的城市代码都相同,因此按Job_ID
分组。然后,我将每个Municipality_code
的最后一个Job_ID
作为更正后的一个。