计算一帧中不同数据集的多个斜率

时间:2018-06-18 11:53:00

标签: r

给出以下数据框(这只是一个示例表作为示例,因此数字可能没有多大意义):

structure(list(ID = c(1, 2, 3, 4, 5, 6), `2005` = c(0L, 0L, 0L, 
2L, 1L, 0L), `2006` = c(0L, 0L, 0L, 1L, 1L, 0L), `2007` = c(1L, 
0L, 1L, 0L, 3L, 0L), `2008` = c(1L, 0L, 0L, 4L, 3L, 0L), `2009` = c(1L, 
0L, 0L, 2L, 3L, 0L), `2010` = c(0L, 0L, 0L, 5L, 0L, 0L), `2011` = c(0L, 
0L, 0L, 0L, 1L, 0L), `2012` = c(0L, 0L, 0L, 4L, 1L, 1L), `2013` = c(1L, 
0L, 1L, 0L, 0L, 0L), `2014` = c(0L, 0L, 2L, 0L, 9L, 0L), `2015` = c(0L, 
0L, 1L, 0L, 2L, 0L), `2016` = c(0L, 0L, 0L, 0L, 0L, 0L), `Cut Off Year` = c("2011", 
"2015", "2015", "2005", "2011", "2007")), .Names = c("ID", "2005", 
"2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013", 
"2014", "2015", "2016", "Cut Off Year"), row.names = c(NA, -6L
), class = c("tbl_df", "tbl", "data.frame"))

我感兴趣的是根据年份(2005-2016)找到每行的斜率。

但是,我想要每行两个斜坡。截止年份之前的数字(第14列)和截止年度之后的另一个斜率的一个斜率。

例如,在第一行中,截止年份是2011年。所以我希望R计算2005-2010年的斜率并将该斜率写入新的列(“Slope Before”),并且然后再次计算2012 - 2016年的斜率,并将其写入该行的第二列(“Slope After”)。

所以最终结果看起来像这样:

structure(list(ID = c(1, 2, 3, 4, 5, 6), `2005` = c(0L, 0L, 0L, 
2L, 1L, 0L), `2006` = c(0L, 0L, 0L, 1L, 1L, 0L), `2007` = c(1L, 
0L, 1L, 0L, 3L, 0L), `2008` = c(1L, 0L, 0L, 4L, 3L, 0L), `2009` = c(1L, 
0L, 0L, 2L, 3L, 0L), `2010` = c(0L, 0L, 0L, 5L, 0L, 0L), `2011` = c(0L, 
0L, 0L, 0L, 1L, 0L), `2012` = c(0L, 0L, 0L, 4L, 1L, 1L), `2013` = c(1L, 
0L, 1L, 0L, 0L, 0L), `2014` = c(0L, 0L, 2L, 0L, 9L, 0L), `2015` = c(0L, 
0L, 1L, 0L, 2L, 0L), `2016` = c(0L, 0L, 0L, 0L, 0L, 0L), `Cut Off Year` = c("2011", 
"2015", "2015", "2005", "2011", "2007"), `Slope Before` = c("Slope1", 
"Slope2", "Slope3", "Slope4", "Slope5", "Slope6"), `Slope After` = c("Slope1", 
"Slope2", "Slope3", "Slope4", "Slope5", "Slope6")), .Names = c("ID", "2005", 
"2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013", 
"2014", "2015", "2016", "Cut Off Year", "Slope Before", "Slope After"), row.names = c(NA, -6L
), class = c("tbl_df", "tbl", "data.frame"))

我试图实现这个功能:

Slope = function(x) {
  Temporary_DF = data.frame(x, year=2:13)
  lm(x ~ year, data=Temporary_DF)$coefficients[2]
}

Transposed_Data = as.data.frame(t(DF))
DF$slope = sapply(Transposed_Data, Slope)

我不认为这可以使用,因为它没有考虑截止年份,我不知道如何实施截止年份。此外,我在应用斜率时遇到问题,因为我的原始数据框包含的其他列不是斜率计算的一部分(第一列和第14列)。

1 个答案:

答案 0 :(得分:1)

就个人而言,我会将您的数据重新排列(整理)为长格式,并使用包data.table及其by(或dplyr,如果您愿意),但您可以使用apply执行此操作:< / p>

DF[, "Cut Off Year"] <- as.numeric(DF[, "Cut Off Year"])

Slope = function(x) {
  Temporary_DF = data.frame(y = x, year=seq_along(x))
  lm(y ~ year, data=Temporary_DF)$coefficients[2]
}

years <- 2005:2016

DF[, c("Slope Before", "Slope After")] <- t(apply(DF[, c(years, "Cut Off Year")], 1,
      function(x) {
        y <- x[-length(x)]
        #subset:
        a <- y[years < x[length(x)]]
        b <- y[years > x[length(x)]]
        a <- if (length(a) > 1) Slope(a) else NA_real_
        b <- if (length(b) > 1) Slope(b) else NA_real_
        c(a, b)
      }))

#  ID 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Cut Off Year Slope Before   Slope After
#1  1    0    0    1    1    1    0    0    0    1    0    0    0         2011   0.08571429 -1.000000e-01
#2  2    0    0    0    0    0    0    0    0    0    0    0    0         2015   0.00000000            NA
#3  3    0    0    1    0    0    0    0    0    1    2    1    0         2015   0.12121212            NA
#4  4    2    1    0    4    2    5    0    4    0    0    0    0         2005           NA -2.000000e-01
#5  5    1    1    3    3    3    0    1    1    0    9    2    0         2011   0.02857143  2.106500e-16
#6  6    0    0    0    0    0    0    0    1    0    0    0    0         2007   0.00000000  1.791615e-18

请注意浮点不准确。