当前“年份”变量当前不是连续的时,如何添加具有最新“年份”数据的行

时间:2019-02-17 16:16:03

标签: r dataframe

我有一个小的数据框,其中包含世界纪录(WR)的时间为女性的10000m次。几年未创下新纪录,但当然,上一个WR一直延续到第二年,直到创下新纪录。我希望数据框在2015年完成。

我需要添加“ Year”变量存在空隙的行,并用上一个WR的数据填充空隙。

#Current section of dataframe (gap from '86-'93 then '93-2015):

  Result Year  Event Gender
1  31.35 1982 10000m  women
2  31.35 1983 10000m  women
3  31.28 1983 10000m  women
4  31.14 1984 10000m  women
5  30.59 1985 10000m  women
6  30.14 1986 10000m  women
7  29.32 1993 10000m  women

#Required result:

  Result Year  Event Gender
1  31.35 1982 10000m  women
2  31.35 1983 10000m  women
3  31.28 1983 10000m  women
4  31.14 1984 10000m  women
5  30.59 1985 10000m  women
6  30.14 1986 10000m  women
7  30.14 1987 10000m  women
8  30.14 1988 10000m  women
9  30.14 1989 10000m  women
10 30.14 1990 10000m  women
11 30.14 1991 10000m  women
12 30.14 1992 10000m  women
13 29.32 1993 10000m  women
14 29.32 1994 10000m  women
...etc

(continue last result until 2015)

2 个答案:

答案 0 :(得分:1)

假定末尾“注释”中的输入将输入数据框与所有年份的数据框合并,并使用来自动物园的na.locf进行填写。

library(zoo)
Year <- data.frame(Year = min(DF$Year):2015)
m <- na.locf(merge(DF, Year, all.y = TRUE), na.rm = FALSE)

给予(输​​出后继续):

> m
   Year Result  Event Gender
1  1982  31.35 10000m  women
2  1983  31.35 10000m  women
3  1983  31.28 10000m  women
4  1984  31.14 10000m  women
5  1985  30.59 10000m  women
6  1986  30.14 10000m  women
7  1987  30.14 10000m  women
8  1988  30.14 10000m  women
9  1989  30.14 10000m  women
10 1990  30.14 10000m  women
11 1991  30.14 10000m  women
12 1992  30.14 10000m  women
13 1993  29.32 10000m  women
14 1994  29.32 10000m  women
15 1995  29.32 10000m  women
16 1996  29.32 10000m  women
17 1997  29.32 10000m  women
18 1998  29.32 10000m  women
19 1999  29.32 10000m  women
20 2000  29.32 10000m  women
21 2001  29.32 10000m  women
22 2002  29.32 10000m  women
23 2003  29.32 10000m  women
24 2004  29.32 10000m  women
25 2005  29.32 10000m  women
26 2006  29.32 10000m  women
27 2007  29.32 10000m  women
28 2008  29.32 10000m  women
29 2009  29.32 10000m  women
30 2010  29.32 10000m  women
31 2011  29.32 10000m  women
32 2012  29.32 10000m  women
33 2013  29.32 10000m  women
34 2014  29.32 10000m  women
35 2015  29.32 10000m  women

,或者如果完整文件具有多个事件和性别,则按事件和性别对其进行拆分,然后对拆分的每个组件进行相同的处理,最后将其重新绑定在一起。我们无法从问题中分辨出来,因此我们假设每个事件/性别应在所有事件和性别中的最低年份开始,并在2015年结束,但是这个假设可以轻松更改。

f <- function(x) na.locf(merge(x, Year, all.y = TRUE), na.rm = FALSE)
out <- do.call("rbind", by(DF, DF[3:4], f))
rownames(out) <- NULL

注意

Lines <- "
  Result Year  Event Gender
1  31.35 1982 10000m  women
2  31.35 1983 10000m  women
3  31.28 1983 10000m  women
4  31.14 1984 10000m  women
5  30.59 1985 10000m  women
6  30.14 1986 10000m  women
7  29.32 1993 10000m  women"
DF <- read.table(text = Lines)

答案 1 :(得分:1)

您可以首先通过指定要填写的年份来完成数据集,然后将每个缺失年份的先前值填充。按事件和性别分组,以便您可以正确填写每个组合的值。

library(tidyr)
library(dplyr)

wr %>%
  group_by(Event, Gender) %>% 
  complete(Year = min(Year):2015) %>% 
  fill(Result, .direction = "down")

# A tibble: 35 x 4
# Groups:   Event, Gender [1]
#    Event  Gender  Year Result
#    <fct>  <fct>  <int>  <dbl>
#  1 10000m women   1982   31.4
#  2 10000m women   1983   31.4
#  3 10000m women   1983   31.3
#  4 10000m women   1984   31.1
#  5 10000m women   1985   30.6
#  6 10000m women   1986   30.1
#  7 10000m women   1987   30.1
#  8 10000m women   1988   30.1
#  9 10000m women   1989   30.1
# 10 10000m women   1990   30.1
# ... with 25 more rows