通过混合现有观察结果生成新观察{R}

时间:2016-02-16 19:38:16

标签: r

问题

我想系统地更改数据集中的变量(population),以便在其当前值之间“混合”中间行。

数据

我目前有5年增量的州县和人口数据。数据在数据框中。

data:
     year       state       county         population
1    1990       Vermont     Chittenden     4050
2    1995       Vermont     Chittenden     4180
3    2000       Vermont     Chittenden     4205
4    2005       Vermont     Chittenden     4350
5    2010       Vermont     Chittenden     4358
6    2015       Vermont     Chittenden     4401

使用此处的技术:Change variable value based on row index {R},我应用了以下内容:

new.data <- data[rep(1:nrow(data),each=5),]
new.data$year <- new.data$year + sequence(rep(5,nrow(data))) -1

结果,我得到了这个:

new.data:
     year       state       county         population
1    1990       Vermont     Chittenden     4050
1.1  1991       Vermont     Chittenden     4050
1.2  1992       Vermont     Chittenden     4050
1.3  1993       Vermont     Chittenden     4050
1.4  1994       Vermont     Chittenden     4050
2    1995       Vermont     Chittenden     4180
2.1  1996       Vermont     Chittenden     4180
2.2  1997       Vermont     Chittenden     4180
2.3  1998       Vermont     Chittenden     4180
2.4  1999       Vermont     Chittenden     4180
3    2000       Vermont     Chittenden     4205
                        ...
5    2010       Vermont     Chittenden     4358
5.1  2010       Vermont     Chittenden     4358
5.2  2011       Vermont     Chittenden     4358
5.3  2012       Vermont     Chittenden     4358
5.4  2013       Vermont     Chittenden     4358
6    2015       Vermont     Chittenden     4401

但是,请注意population一次不会改变五年。我想找出一种方法来“混合”增量值之间的中间值。它看起来像这样:

new.data:
     year       state       county         population
1    1990       Vermont     Chittenden     4050
1.1  1991       Vermont     Chittenden     4076
1.2  1992       Vermont     Chittenden     4102
1.3  1993       Vermont     Chittenden     4128
1.4  1994       Vermont     Chittenden     4154
2    1995       Vermont     Chittenden     4180
2.1  1996       Vermont     Chittenden     4185
2.2  1997       Vermont     Chittenden     4190
2.3  1998       Vermont     Chittenden     4195
2.4  1999       Vermont     Chittenden     4200
3    2000       Vermont     Chittenden     4205
                      ...
5    2010       Vermont     Chittenden     4358
5.1  2011       Vermont     Chittenden     4367
5.2  2012       Vermont     Chittenden     4376
5.3  2013       Vermont     Chittenden     4385
5.4  2014       Vermont     Chittenden     4394
6    2015       Vermont     Chittenden     4401

我该如何做到这一点?

如果需要,我很乐意发布更多信息。谢谢!

1 个答案:

答案 0 :(得分:3)

这种观察混合称为插值。有很多种方法,最简单的方法之一是线性插值,可以按如下方式进行:

year <- seq(1990, 2015, by = 5)
population <- c(4050, 4180, 4205, 4350, 4358, 4401)
approx(x = year, y = population, xout = min(year):max(year))
# $x
#  [1] 1990 1991 1992 1993 ...
#
# $y
#  [1] 4050.0 4076.0 4102.0 4128.0 4154.0 4180.0 4185.0 ...

另请考虑检查?splines;然后,通过比使用线性插值更平滑,得到的曲线将更“漂亮”。