滞后到前一天

时间:2019-08-20 21:28:47

标签: r dplyr

我正在尝试将数据滞后到R中的前一天。但是,在数据集中,每天都有多个观测值。我该如何实现?

我已经研究过使用dplyr滞后变量并使用if语句来实现这一点,但是这将需要大约8个嵌套的if语句,以确保所有观察结果都滞后于前一天。

  df <- df %>% dplyr::group_by(HomeTeam) %>% 
 dplyr::arrange(Date) %>%
 dplyr::mutate(Score = ifelse(lag(Date) != Date, lag(Score), 
                          ifelse(lag(Date, n = 2) != lag(Date), 
                                  lag(Score,n = 2), ifelse...)))

df <- data.frame(HomeTeam = c("Wolves", "Wolves", "Wolves"), Date = c("2019-08-20", "2019-08-20", "2019-08-19")

输入数据

HomeTeam       Date           Score
Wolves         2019-08-20     3
Wolves         2019-08-20     1
Wolves         2019-08-19     4

输出数据

HomeTeam       Date           Score
Wolves         2019-08-20     4
Wolves         2019-08-20     4
Wolves         2019-08-19     NA

解决方案

df <- data.frame(HomeTeam = c("Wolves", "Wolves",             
"Wolves","Wolves","Wolves", "Man Utd", "Man Utd", "Man Utd"), Date =         
c("2019-08-20", "2019-08-20", "2019-08-19", "2019-08-19", "2019-08-15", 
"2019-06-01", "2019-06-01", "2019-04-01"), Score = c(3,1,2,2,4,5,6,7))

df %>% dplyr::mutate(Date = as.Date(Date)) %>%
  dplyr::arrange(Date)%>% 
  dplyr::group_by(HomeTeam) %>%
  dplyr::mutate(lagScore = lag(Score)) %>%
  dplyr::arrange(Date) %>%
  dplyr::group_by(Date,HomeTeam) %>%
  dplyr::mutate(lagScore = lagScore[1]) %>%
  dplyr::ungroup()

# A tibble: 8 x 4
#  HomeTeam Date       Score lagScore
#  <fct>    <date>     <dbl>    <dbl>
# Man Utd  2019-04-01     7       NA
# Man Utd  2019-06-01     5        7
# Man Utd  2019-06-01     6        7
# Wolves   2019-08-15     4       NA
# Wolves   2019-08-19     2        4
# Wolves   2019-08-19     2        4
# Wolves   2019-08-20     3        2
# Wolves   2019-08-20     1        2               

1 个答案:

答案 0 :(得分:2)

假设您落后于以前发生的日期(可能不是“昨天”),请尝试以下操作:

library(dplyr)
set.seed(2)
data.frame(
  id = 1:10,
  Date = Sys.Date() + sort(sample(c(0, 1, 3), size=10, replace=TRUE))
) %>%
  mutate(lagDate = lag(Date)) %>%
  group_by(Date) %>%
  mutate(lagDate = lagDate[1]) %>%
  ungroup()
# # A tibble: 10 x 3
#       id Date       lagDate   
#    <int> <date>     <date>    
#  1     1 2019-08-20 NA        
#  2     2 2019-08-20 NA        
#  3     3 2019-08-20 NA        
#  4     4 2019-08-21 2019-08-20
#  5     5 2019-08-21 2019-08-20
#  6     6 2019-08-21 2019-08-20
#  7     7 2019-08-23 2019-08-21
#  8     8 2019-08-23 2019-08-21
#  9     9 2019-08-23 2019-08-21
# 10    10 2019-08-23 2019-08-21

多一点,尝试一下:

df <- read.table(header=TRUE, stringsAsFactors=FALSE, text="
HomeTeam       Date           Score
Wolves         2019-08-21     96
Wolves         2019-08-21     97
Wolves         2019-08-20     3
Wolves         2019-08-20     1
Wolves         2019-08-19     4")
df$Date <- as.Date(df$Date)

df %>%
  mutate(lagDate = lag(Date)) %>%
  group_by(HomeTeam, Date) %>%
  summarize(lagDate = lagDate[1], Score = Score[1]) %>%
  ungroup() %>%
  select(HomeTeam, Date = lagDate, Score) %>%
  right_join(select(df, -Score), by = c("HomeTeam", "Date"))
# # A tibble: 5 x 3
#   HomeTeam Date       Score
#   <chr>    <date>     <int>
# 1 Wolves   2019-08-21     3
# 2 Wolves   2019-08-21     3
# 3 Wolves   2019-08-20     4
# 4 Wolves   2019-08-20     4
# 5 Wolves   2019-08-19    NA