我有一个data.frame
,其中包含一组观察的不同年份数据。列的名称是年份,重复的年份由年份标识,后跟".1"
(2008
和2008.1
重复的年份。
第一次观察dput()
的{{1}}如下:
data.frame
我想计算一年中的平均值和重复的年份(2008年和2008年1月)。为了简化这个过程,我尝试了每一年重复的循环:
structure(list(ID = 2174L, `1992` = 0L, `1993` = 0L, `1994` = 0L,
`1994.1` = 0L, `1995` = 0L, `1996` = 0L, `1997` = 0L, `1998` = 0L,
`1999` = 0L, `1997.1` = 0L, `1998.1` = 0L, `1999.1` = 0L,
`2000` = 0L, `2001` = 0L, `2002` = 0L, `2003` = 0L, `2000.1` = 0L,
`2001.1` = 0L, `2002.1` = 0L, `2003.1` = 0L, `2004` = 0L,
`2005` = 0L, `2006` = 0L, `2007` = 0L, `2008` = 0L, `2004.1` = 0L,
`2005.1` = 0L, `2006.1` = 0L, `2007.1` = 0L, `2008.1` = 0L,
`2009` = 0L, `2010` = 0L, `2011` = 0L, `2012` = 0L, `2013` = 0L,
altura_mean_30arc = 341, dist_p = -1239.46778549383, dist_capital = 310537.289055982,
municode = 428, slope = 0.109233340937795, dist_f = -54589.0213329769), .Names = c("ID",
"1992", "1993", "1994", "1994.1", "1995", "1996", "1997", "1998",
"1999", "1997.1", "1998.1", "1999.1", "2000", "2001", "2002",
"2003", "2000.1", "2001.1", "2002.1", "2003.1", "2004", "2005",
"2006", "2007", "2008", "2004.1", "2005.1", "2006.1", "2007.1",
"2008.1", "2009", "2010", "2011", "2012", "2013", "altura_mean_30arc",
"dist_p", "dist_capital", "municode", "slope", "dist_f"), row.names = 2174L, class = "data.frame")
但结果是一组带有NA的新变量。我知道我可以使用 duplicated_years <- c("1994", "1997", "1998", "1999", "2000", "2001", "2002", "2003", "2004", "2005", "2006", "2007", "2008")
duplicated_years2 <- str_c(duplicated_years, "1", sep = ".")
for(i in as.numeric(duplicated_years)){
for(j in as.numeric(duplicated_years2)){
df[, str_c(i, "mean", sep="_")] <- ((df$i + df$j) / 2)
}
}
代替,但索引对我来说非常困难
答案 0 :(得分:3)
当您使用宽格式并且有许多列可以按行操作时,最好(在R中)转换为长格式并在单列上操作。然后转换回宽格式(如果需要)非常简单
例如,这里有一种方法可以找到包含一年的所有列
colindex <- grep("\\d{4}", names(df))
然后,使用data.table
,我们可以选择那些(ID
也是{),melt
为长格式,计算每用户/年的均值,同时转换回宽格式。
library(data.table)
dcast(melt(setDT(df)[, c(1L, colindex), with = FALSE], id = 1L),
ID ~ sub("\\..*", "", variable), value.var = "value", mean)
# ID 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013
# 1: 2174 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0