在尝试获取分组滞后变量的过程中(仅使用lag
不可能),建议的解决方案是将数据拉出,滞后于不同的行,然后重新加入吧。
我更喜欢在不创建中间对象的情况下执行此操作,并且希望在链中间执行此操作。然而,它似乎并没有像我期望的那样工作,问题似乎是在使用.
和left_join中的嵌套链之间的一些交互。
require(tidyverse)
#> Loading required package: tidyverse
df <- data.frame(Team = c("A", "A", "A", "A", "B", "B", "B", "C", "C", "D", "D"),
Date = c("2016-05-10","2016-05-10", "2016-05-10", "2016-05-10",
"2016-05-12", "2016-05-12", "2016-05-12",
"2016-05-15","2016-05-15",
"2016-05-30", "2016-05-30"),
Points = c(1,4,3,2,1,5,6,1,2,3,9)
)
#This works:
df %>% left_join(x = ., y = df %>%
distinct(Team, Date) %>%
mutate(Date_Lagged = lag(Date)))
#> Joining, by = c("Team", "Date")
#> Team Date Points Date_Lagged
#> 1 A 2016-05-10 1 <NA>
#> 2 A 2016-05-10 4 <NA>
#> 3 A 2016-05-10 3 <NA>
#> 4 A 2016-05-10 2 <NA>
#> 5 B 2016-05-12 1 2016-05-10
#> 6 B 2016-05-12 5 2016-05-10
#> 7 B 2016-05-12 6 2016-05-10
#> 8 C 2016-05-15 1 2016-05-12
#> 9 C 2016-05-15 2 2016-05-12
#> 10 D 2016-05-30 3 2016-05-15
#> 11 D 2016-05-30 9 2016-05-15
#And this works:
df %>% left_join(x = ., y = .)
#> Joining, by = c("Team", "Date", "Points")
#> Team Date Points
#> 1 A 2016-05-10 1
#> 2 A 2016-05-10 4
#> 3 A 2016-05-10 3
#> 4 A 2016-05-10 2
#> 5 B 2016-05-12 1
#> 6 B 2016-05-12 5
#> 7 B 2016-05-12 6
#> 8 C 2016-05-15 1
#> 9 C 2016-05-15 2
#> 10 D 2016-05-30 3
#> 11 D 2016-05-30 9
#This doesn't work despite the fact that `.` is df.
df %>% left_join(x = ., y = . %>%
distinct(Team, Date) %>%
mutate(Date_Lagged = lag(Date)))
#> Error in UseMethod("tbl_vars"): no applicable method for 'tbl_vars' applied to an object of class "c('fseq', 'function')"
#Desired output
distinct(df, Team, Date) %>%
mutate(Date_Lagged = lag(Date)) %>%
right_join(., df) %>%
select(Team, Date, Points, Date_Lagged)
#> Joining, by = c("Team", "Date")
#> Team Date Points Date_Lagged
#> 1 A 2016-05-10 1 <NA>
#> 2 A 2016-05-10 4 <NA>
#> 3 A 2016-05-10 3 <NA>
#> 4 A 2016-05-10 2 <NA>
#> 5 B 2016-05-12 1 2016-05-10
#> 6 B 2016-05-12 5 2016-05-10
#> 7 B 2016-05-12 6 2016-05-10
#> 8 C 2016-05-15 1 2016-05-12
#> 9 C 2016-05-15 2 2016-05-12
#> 10 D 2016-05-30 3 2016-05-15
#> 11 D 2016-05-30 9 2016-05-15
由reprex package(v0.2.0)创建于2018-06-12。
答案 0 :(得分:9)
要使代码生效,您需要在y
参数周围加上大括号,如下所示
df %>% left_join(x = ., y = {.} %>%
distinct(Team, Date) %>%
mutate(Date_Lagged = lag(Date)))
Joining, by = c("Team", "Date")
Team Date Points Date_Lagged
1 A 2016-05-10 1 <NA>
2 A 2016-05-10 4 <NA>
3 A 2016-05-10 3 <NA>
4 A 2016-05-10 2 <NA>
5 B 2016-05-12 1 2016-05-10
6 B 2016-05-12 5 2016-05-10
7 B 2016-05-12 6 2016-05-10
8 C 2016-05-15 1 2016-05-12
9 C 2016-05-15 2 2016-05-12
10 D 2016-05-30 3 2016-05-15
11 D 2016-05-30 9 2016-05-15
你可以做到
df %>% left_join(df%>%
distinct(Team, Date) %>%
mutate(Date_Lagged = lag(Date)))
答案 1 :(得分:3)
虽然这不是我的问题的答案(Onyambo提供了!),我想分享我找到了另一种方法来完成同样的事情。基本上你使用group_by()
和nest()
来挤压tibble并将重复的vars放在一边,做滞后和unnest()
。
df %>%
group_by(Team, Date) %>%
nest() %>%
mutate(Date_Lagged = lag(Date)) %>%
unnest()
#> # A tibble: 11 x 4
#> Team Date Date_Lagged Points
#> <fct> <fct> <fct> <dbl>
#> 1 A 2016-05-10 <NA> 1
#> 2 A 2016-05-10 <NA> 4
#> 3 A 2016-05-10 <NA> 3
#> 4 A 2016-05-10 <NA> 2
#> 5 B 2016-05-12 2016-05-10 1
#> 6 B 2016-05-12 2016-05-10 5
#> 7 B 2016-05-12 2016-05-10 6
#> 8 C 2016-05-15 2016-05-12 1
#> 9 C 2016-05-15 2016-05-12 2
#> 10 D 2016-05-30 2016-05-15 3
#> 11 D 2016-05-30 2016-05-15 9
由reprex package(v0.2.0)创建于2018-06-14。
答案 2 :(得分:2)
如果您不介意交换管道嵌套以进行功能嵌套,则可以实现目标:
df %>% left_join(mutate(distinct(., Team, Date), Date_Lagged = lag(Date)))
输出:
Joining, by = c("Team", "Date")
Team Date Points Date_Lagged
1 A 2016-05-10 1 <NA>
2 A 2016-05-10 4 <NA>
3 A 2016-05-10 3 <NA>
4 A 2016-05-10 2 <NA>
5 B 2016-05-12 1 2016-05-10
6 B 2016-05-12 5 2016-05-10
7 B 2016-05-12 6 2016-05-10
8 C 2016-05-15 1 2016-05-12
9 C 2016-05-15 2 2016-05-12
10 D 2016-05-30 3 2016-05-15
11 D 2016-05-30 9 2016-05-15