在R中重塑数据。是否有可能有两个“价值变量”

时间:2012-09-15 11:36:18

标签: r plyr reshape

我正在努力使用reshape包来寻找一种“强制转换”数据帧但在“value.var”中有两个(或更多)值的方法。

这是我想要实现的一个例子。

df <- data.frame( StudentID = c("x1", "x10", "x2", 
                            "x3", "x4", "x5", "x6", "x7", "x8", "x9"),
              StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'),
              ExamenYear    = c('2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'),
              Exam          = c('algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'),
              participated  = c('no','yes','yes','yes','no','yes','yes','yes','yes','yes'),  
              passed      = c('no','yes','yes','yes','no','yes','yes','yes','no','yes'),
              stringsAsFactors = FALSE)

从df我可以创建以下数据帧:

tx <- ddply(df, c('ExamenYear','StudentGender'), summarize,
        participated = sum(participated      == "yes"),
        passed   = sum(passed      == "yes"))

在重塑逻辑中,我有两个“值变量”参与并传递

我正在寻找在一个数据框中合并以下信息的方法:

 dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'participated')
 dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'passed')

我想要创建的结束表看起来像这样

tempTab1 <- dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'participated')
tempTab2 <- dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'passed')

as.data.frame(cbind(ExamenYear = tempTab1[,1],
                Female_Participated = tempTab1[,2],
                Female_Passed       = tempTab2[,2],
                Male_Participated    = tempTab1[,3],
                Male_Passed          = tempTab2[,3]
                ))

在投射函数中是否可以有两个“值变量”?

1 个答案:

答案 0 :(得分:11)

由于您已经做到这一点,为什么不melt您的tx对象并使用dcast,如下所示:

dcast(melt(tx, id.vars=c(1, 2)), ExamenYear ~ StudentGender + variable)
#   ExamenYear F_participated F_passed M_participated M_passed
# 1       2007              1        1              1        1
# 2       2008              1        1              2        2
# 3       2009             NA       NA              3        2

然而,更直接的方法可能是从一开始就melt您的数据:

df.m <- melt(df, id.vars=c(1:4))
dcast(df.m, ExamenYear ~ StudentGender + variable, 
      function(x) sum(x == "yes"))
#   ExamenYear F_participated F_passed M_participated M_passed
# 1       2007              1        1              1        1
# 2       2008              1        1              2        2
# 3       2009              0        0              3        2

更新:基础R方法

虽然所需的代码不是“漂亮”,但在基础R中执行此操作也不是太困难。这是一种方法:

  1. 使用aggregate()从您的示例中获取tx

    dfa <- aggregate(cbind(participated, passed) ~ 
      ExamenYear + StudentGender, df, function(x) sum(x == "yes"))
    dfa
    #   ExamenYear StudentGender participated passed
    # 1       2007             F            1      1
    # 2       2008             F            1      1
    # 3       2007             M            1      1
    # 4       2008             M            2      2
    # 5       2009             M            3      2
    
  2. 使用reshapedfa从“long”转换为“wide”。

    reshape(dfa, direction = "wide", 
            idvar="ExamenYear", timevar="StudentGender")
    #   ExamenYear participated.F passed.F participated.M passed.M
    # 1       2007              1        1              1        1
    # 2       2008              1        1              2        2
    # 5       2009             NA       NA              3        2