我想系统地更改数据集中的变量(population
),以便在其当前值之间“混合”中间行。
我目前有5年增量的州县和人口数据。数据在数据框中。
data:
year state county population
1 1990 Vermont Chittenden 4050
2 1995 Vermont Chittenden 4180
3 2000 Vermont Chittenden 4205
4 2005 Vermont Chittenden 4350
5 2010 Vermont Chittenden 4358
6 2015 Vermont Chittenden 4401
使用此处的技术:Change variable value based on row index {R},我应用了以下内容:
new.data <- data[rep(1:nrow(data),each=5),]
new.data$year <- new.data$year + sequence(rep(5,nrow(data))) -1
结果,我得到了这个:
new.data:
year state county population
1 1990 Vermont Chittenden 4050
1.1 1991 Vermont Chittenden 4050
1.2 1992 Vermont Chittenden 4050
1.3 1993 Vermont Chittenden 4050
1.4 1994 Vermont Chittenden 4050
2 1995 Vermont Chittenden 4180
2.1 1996 Vermont Chittenden 4180
2.2 1997 Vermont Chittenden 4180
2.3 1998 Vermont Chittenden 4180
2.4 1999 Vermont Chittenden 4180
3 2000 Vermont Chittenden 4205
...
5 2010 Vermont Chittenden 4358
5.1 2010 Vermont Chittenden 4358
5.2 2011 Vermont Chittenden 4358
5.3 2012 Vermont Chittenden 4358
5.4 2013 Vermont Chittenden 4358
6 2015 Vermont Chittenden 4401
但是,请注意population
一次不会改变五年。我想找出一种方法来“混合”增量值之间的中间值。它看起来像这样:
new.data:
year state county population
1 1990 Vermont Chittenden 4050
1.1 1991 Vermont Chittenden 4076
1.2 1992 Vermont Chittenden 4102
1.3 1993 Vermont Chittenden 4128
1.4 1994 Vermont Chittenden 4154
2 1995 Vermont Chittenden 4180
2.1 1996 Vermont Chittenden 4185
2.2 1997 Vermont Chittenden 4190
2.3 1998 Vermont Chittenden 4195
2.4 1999 Vermont Chittenden 4200
3 2000 Vermont Chittenden 4205
...
5 2010 Vermont Chittenden 4358
5.1 2011 Vermont Chittenden 4367
5.2 2012 Vermont Chittenden 4376
5.3 2013 Vermont Chittenden 4385
5.4 2014 Vermont Chittenden 4394
6 2015 Vermont Chittenden 4401
我该如何做到这一点?
如果需要,我很乐意发布更多信息。谢谢!
答案 0 :(得分:3)
这种观察混合称为插值。有很多种方法,最简单的方法之一是线性插值,可以按如下方式进行:
year <- seq(1990, 2015, by = 5)
population <- c(4050, 4180, 4205, 4350, 4358, 4401)
approx(x = year, y = population, xout = min(year):max(year))
# $x
# [1] 1990 1991 1992 1993 ...
#
# $y
# [1] 4050.0 4076.0 4102.0 4128.0 4154.0 4180.0 4185.0 ...
另请考虑检查?splines
;然后,通过比使用线性插值更平滑,得到的曲线将更“漂亮”。