将计数汇总数据重新整理为R中的长格式

时间:2014-09-04 14:56:17

标签: r

令人尴尬的基本问题,但如果你不知道......我需要重新设计一个计数汇总数据的数据框,以便在汇总之前看起来像是什么样子。这基本上与{plyr} count()相反,例如

> (d = data.frame(value=c(1,1,1,2,3,3), cat=c('A','A','A','A','B','B')))
  value cat
1     1   A
2     1   A
3     1   A
4     2   A
5     3   B
6     3   B
> (summry = plyr::count(d))
  value cat freq
1     1   A    3
2     2   A    1
3     3   B    2

如果您从summry开始,回到d的最快捷方式是什么?除非我错了(非常可能),{Reshape2}不会这样做..

2 个答案:

答案 0 :(得分:2)

只需使用rep

summry[rep(rownames(summry), summry$freq), c("value", "cat")]
#     value cat
# 1       1   A
# 1.1     1   A
# 1.2     1   A
# 2       2   A
# 3       3   B
# 3.1     3   B

可以在expandRowsmy "SOfun" package中找到此方法的变体形式。如果您已经加载了,您可以简单地执行:

expandRows(summry, "freq")

答案 1 :(得分:1)

R cookbook website上的数据框功能有一个很好的表格,你可以稍微修改一下。唯一的修改是改变' Freq' - > '频率' (与plyr::count一致)并确保将rownames重置为增加的整数。

expand.dft <- function(x, na.strings = "NA", as.is = FALSE, dec = ".") {
  # Take each row in the source data frame table and replicate it
  # using the Freq value
  DF <- sapply(1:nrow(x), 
               function(i) x[rep(i, each = x$freq[i]), ],
               simplify = FALSE)

  # Take the above list and rbind it to create a single DF
  # Also subset the result to eliminate the Freq column
  DF <- subset(do.call("rbind", DF), select = -freq)

  # Now apply type.convert to the character coerced factor columns  
  # to facilitate data type selection for each column 
  for (i in 1:ncol(DF)) {
    DF[[i]] <- type.convert(as.character(DF[[i]]),
                            na.strings = na.strings,
                            as.is = as.is, dec = dec)
  }
  row.names(DF) <- seq(nrow(DF))
  DF
}

expand.dft(summry)

  value cat
1     1   A
2     1   A
3     1   A
4     2   A
5     3   B
6     3   B