给出以下数据框(这只是一个示例表作为示例,因此数字可能没有多大意义):
structure(list(ID = c(1, 2, 3, 4, 5, 6), `2005` = c(0L, 0L, 0L,
2L, 1L, 0L), `2006` = c(0L, 0L, 0L, 1L, 1L, 0L), `2007` = c(1L,
0L, 1L, 0L, 3L, 0L), `2008` = c(1L, 0L, 0L, 4L, 3L, 0L), `2009` = c(1L,
0L, 0L, 2L, 3L, 0L), `2010` = c(0L, 0L, 0L, 5L, 0L, 0L), `2011` = c(0L,
0L, 0L, 0L, 1L, 0L), `2012` = c(0L, 0L, 0L, 4L, 1L, 1L), `2013` = c(1L,
0L, 1L, 0L, 0L, 0L), `2014` = c(0L, 0L, 2L, 0L, 9L, 0L), `2015` = c(0L,
0L, 1L, 0L, 2L, 0L), `2016` = c(0L, 0L, 0L, 0L, 0L, 0L), `Cut Off Year` = c("2011",
"2015", "2015", "2005", "2011", "2007")), .Names = c("ID", "2005",
"2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013",
"2014", "2015", "2016", "Cut Off Year"), row.names = c(NA, -6L
), class = c("tbl_df", "tbl", "data.frame"))
我感兴趣的是根据年份(2005-2016)找到每行的斜率。
但是,我想要每行两个斜坡。截止年份之前的数字(第14列)和截止年度之后的另一个斜率的一个斜率。
例如,在第一行中,截止年份是2011年。所以我希望R计算2005-2010年的斜率并将该斜率写入新的列(“Slope Before”),并且然后再次计算2012 - 2016年的斜率,并将其写入该行的第二列(“Slope After”)。
所以最终结果看起来像这样:
structure(list(ID = c(1, 2, 3, 4, 5, 6), `2005` = c(0L, 0L, 0L,
2L, 1L, 0L), `2006` = c(0L, 0L, 0L, 1L, 1L, 0L), `2007` = c(1L,
0L, 1L, 0L, 3L, 0L), `2008` = c(1L, 0L, 0L, 4L, 3L, 0L), `2009` = c(1L,
0L, 0L, 2L, 3L, 0L), `2010` = c(0L, 0L, 0L, 5L, 0L, 0L), `2011` = c(0L,
0L, 0L, 0L, 1L, 0L), `2012` = c(0L, 0L, 0L, 4L, 1L, 1L), `2013` = c(1L,
0L, 1L, 0L, 0L, 0L), `2014` = c(0L, 0L, 2L, 0L, 9L, 0L), `2015` = c(0L,
0L, 1L, 0L, 2L, 0L), `2016` = c(0L, 0L, 0L, 0L, 0L, 0L), `Cut Off Year` = c("2011",
"2015", "2015", "2005", "2011", "2007"), `Slope Before` = c("Slope1",
"Slope2", "Slope3", "Slope4", "Slope5", "Slope6"), `Slope After` = c("Slope1",
"Slope2", "Slope3", "Slope4", "Slope5", "Slope6")), .Names = c("ID", "2005",
"2006", "2007", "2008", "2009", "2010", "2011", "2012", "2013",
"2014", "2015", "2016", "Cut Off Year", "Slope Before", "Slope After"), row.names = c(NA, -6L
), class = c("tbl_df", "tbl", "data.frame"))
我试图实现这个功能:
Slope = function(x) {
Temporary_DF = data.frame(x, year=2:13)
lm(x ~ year, data=Temporary_DF)$coefficients[2]
}
Transposed_Data = as.data.frame(t(DF))
DF$slope = sapply(Transposed_Data, Slope)
我不认为这可以使用,因为它没有考虑截止年份,我不知道如何实施截止年份。此外,我在应用斜率时遇到问题,因为我的原始数据框包含的其他列不是斜率计算的一部分(第一列和第14列)。
答案 0 :(得分:1)
就个人而言,我会将您的数据重新排列(整理)为长格式,并使用包data.table及其by
(或dplyr,如果您愿意),但您可以使用apply
执行此操作:< / p>
DF[, "Cut Off Year"] <- as.numeric(DF[, "Cut Off Year"])
Slope = function(x) {
Temporary_DF = data.frame(y = x, year=seq_along(x))
lm(y ~ year, data=Temporary_DF)$coefficients[2]
}
years <- 2005:2016
DF[, c("Slope Before", "Slope After")] <- t(apply(DF[, c(years, "Cut Off Year")], 1,
function(x) {
y <- x[-length(x)]
#subset:
a <- y[years < x[length(x)]]
b <- y[years > x[length(x)]]
a <- if (length(a) > 1) Slope(a) else NA_real_
b <- if (length(b) > 1) Slope(b) else NA_real_
c(a, b)
}))
# ID 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 Cut Off Year Slope Before Slope After
#1 1 0 0 1 1 1 0 0 0 1 0 0 0 2011 0.08571429 -1.000000e-01
#2 2 0 0 0 0 0 0 0 0 0 0 0 0 2015 0.00000000 NA
#3 3 0 0 1 0 0 0 0 0 1 2 1 0 2015 0.12121212 NA
#4 4 2 1 0 4 2 5 0 4 0 0 0 0 2005 NA -2.000000e-01
#5 5 1 1 3 3 3 0 1 1 0 9 2 0 2011 0.02857143 2.106500e-16
#6 6 0 0 0 0 0 0 0 1 0 0 0 0 2007 0.00000000 1.791615e-18
请注意浮点不准确。