“重塑”中的“聚合”非数字变量

时间:2012-04-30 12:35:49

标签: r aggregate reshape

我有一个长格式的数据集,并希望使用Reshape或Reshape之前的任何预处理将其转换为宽格式。困难在于“值”变量是非数字的。请注意,原始数据中也存在合法的重复记录。以下代码显示了每个的数据布局。

id = c(1, 1, 1, 1, 1, 1, 1)
month <- c("jan", "feb", "feb", "march", "april", "april", "april")
stress <- c("mild", "mild", "high", "moderate", "mild", "high", "mild")
Longdata <- data.frame(id, month, stress, stringsAsFactors = FALSE)

这是原始格式:

> Longdata
  id month   stress
1  1   jan     mild
2  1   feb     mild
3  1   feb     high
4  1 march moderate
5  1 april     mild
6  1 april     high
7  1 april     mild

这就是我想要组织数据的方式:

id <- c(1)
jan <- c("mild")
feb <- c("mild-high")
march <- c("moderate")
april <- c("mild-high-mild")
widedata <- data.frame(id, jan, feb, march, april, stringsAsFactors = FALSE)
> widedata
  id  jan       feb    march          april
1  1 mild mild-high moderate mild-high-mild

1 个答案:

答案 0 :(得分:0)

您可以分两步完成此操作,首先使用aggregate,然后使用“reshape2”包中的基础R reshapedcast

  1. 汇总步骤:

    Mediumdata <- aggregate(stress ~ id + month, Longdata, paste, collapse="-")
    Mediumdata
    #   id month         stress
    # 1  1 april mild-high-mild
    # 2  1   feb      mild-high
    # 3  1   jan           mild
    # 4  1 march       moderate
    
  2. 重塑步骤:

    # Using base R reshape
    reshape(Mediumdata, direction="wide", idvar="id", timevar="month")
    #   id   stress.april stress.feb stress.jan stress.march
    # 1  1 mild-high-mild  mild-high       mild     moderate
    
    # Using `dcast` from "reshape2"
    dcast(mediumdata, id ~ month, value.var="stress")
    #   id          april       feb  jan    march
    # 1  1 mild-high-mild mild-high mild moderate