我有一个小的数据框,其中包含世界纪录(WR)的时间为女性的10000m次。几年未创下新纪录,但当然,上一个WR一直延续到第二年,直到创下新纪录。我希望数据框在2015年完成。
我需要添加“ Year”变量存在空隙的行,并用上一个WR的数据填充空隙。
#Current section of dataframe (gap from '86-'93 then '93-2015):
Result Year Event Gender
1 31.35 1982 10000m women
2 31.35 1983 10000m women
3 31.28 1983 10000m women
4 31.14 1984 10000m women
5 30.59 1985 10000m women
6 30.14 1986 10000m women
7 29.32 1993 10000m women
#Required result:
Result Year Event Gender
1 31.35 1982 10000m women
2 31.35 1983 10000m women
3 31.28 1983 10000m women
4 31.14 1984 10000m women
5 30.59 1985 10000m women
6 30.14 1986 10000m women
7 30.14 1987 10000m women
8 30.14 1988 10000m women
9 30.14 1989 10000m women
10 30.14 1990 10000m women
11 30.14 1991 10000m women
12 30.14 1992 10000m women
13 29.32 1993 10000m women
14 29.32 1994 10000m women
...etc
(continue last result until 2015)
答案 0 :(得分:1)
假定末尾“注释”中的输入将输入数据框与所有年份的数据框合并,并使用来自动物园的na.locf
进行填写。
library(zoo)
Year <- data.frame(Year = min(DF$Year):2015)
m <- na.locf(merge(DF, Year, all.y = TRUE), na.rm = FALSE)
给予(输出后继续):
> m
Year Result Event Gender
1 1982 31.35 10000m women
2 1983 31.35 10000m women
3 1983 31.28 10000m women
4 1984 31.14 10000m women
5 1985 30.59 10000m women
6 1986 30.14 10000m women
7 1987 30.14 10000m women
8 1988 30.14 10000m women
9 1989 30.14 10000m women
10 1990 30.14 10000m women
11 1991 30.14 10000m women
12 1992 30.14 10000m women
13 1993 29.32 10000m women
14 1994 29.32 10000m women
15 1995 29.32 10000m women
16 1996 29.32 10000m women
17 1997 29.32 10000m women
18 1998 29.32 10000m women
19 1999 29.32 10000m women
20 2000 29.32 10000m women
21 2001 29.32 10000m women
22 2002 29.32 10000m women
23 2003 29.32 10000m women
24 2004 29.32 10000m women
25 2005 29.32 10000m women
26 2006 29.32 10000m women
27 2007 29.32 10000m women
28 2008 29.32 10000m women
29 2009 29.32 10000m women
30 2010 29.32 10000m women
31 2011 29.32 10000m women
32 2012 29.32 10000m women
33 2013 29.32 10000m women
34 2014 29.32 10000m women
35 2015 29.32 10000m women
,或者如果完整文件具有多个事件和性别,则按事件和性别对其进行拆分,然后对拆分的每个组件进行相同的处理,最后将其重新绑定在一起。我们无法从问题中分辨出来,因此我们假设每个事件/性别应在所有事件和性别中的最低年份开始,并在2015年结束,但是这个假设可以轻松更改。
f <- function(x) na.locf(merge(x, Year, all.y = TRUE), na.rm = FALSE)
out <- do.call("rbind", by(DF, DF[3:4], f))
rownames(out) <- NULL
Lines <- "
Result Year Event Gender
1 31.35 1982 10000m women
2 31.35 1983 10000m women
3 31.28 1983 10000m women
4 31.14 1984 10000m women
5 30.59 1985 10000m women
6 30.14 1986 10000m women
7 29.32 1993 10000m women"
DF <- read.table(text = Lines)
答案 1 :(得分:1)
您可以首先通过指定要填写的年份来完成数据集,然后将每个缺失年份的先前值填充。按事件和性别分组,以便您可以正确填写每个组合的值。
library(tidyr)
library(dplyr)
wr %>%
group_by(Event, Gender) %>%
complete(Year = min(Year):2015) %>%
fill(Result, .direction = "down")
# A tibble: 35 x 4
# Groups: Event, Gender [1]
# Event Gender Year Result
# <fct> <fct> <int> <dbl>
# 1 10000m women 1982 31.4
# 2 10000m women 1983 31.4
# 3 10000m women 1983 31.3
# 4 10000m women 1984 31.1
# 5 10000m women 1985 30.6
# 6 10000m women 1986 30.1
# 7 10000m women 1987 30.1
# 8 10000m women 1988 30.1
# 9 10000m women 1989 30.1
# 10 10000m women 1990 30.1
# ... with 25 more rows