stats :: reshape的替代品

时间:2012-02-18 14:05:13

标签: r reshape

重塑包中的熔化/浇铸功能很棒,但是当测量变量属于不同类型时,我不确定是否有一种简单的方法可以应用它们。例如,以下是数据摘录,其中每个MD提供三名患者的性别和体重:

ID PT1 WT1 PT2 WT2 PT3 WT3
1  "M" 170 "M" 175 "F" 145
...

目标是重塑,因此每一行都是患者:

ID PTNUM GENDER WEIGHT
1    1     "M"    170
1    2     "M"    175
1    3     "F"    145
...

在stats包中使用reshape函数是我所知道的一个选项,但我在这里发帖是希望R用户比我更有经验的人会发布其他的,希望更好的方法。非常感谢!

-

@Vincent Zoonekynd:

我很喜欢你的例子,所以我把它推广到多个变量。

# Sample data
n <- 5
d <- data.frame(
  id = 1:n,
  p1 = sample(c("M","F"),n,replace=TRUE),
  q1 = sample(c("Alpha","Beta"),n,replace=TRUE),
  w1 = round(runif(n,100,200)),
  y1 = round(runif(n,100,200)),
  p2 = sample(c("M","F"),n,replace=TRUE),
  q2 = sample(c("Alpha","Beta"),n,replace=TRUE),
  w2 = round(runif(n,100,200)),
  y2 = round(runif(n,100,200)),
  p3 = sample(c("M","F"),n,replace=TRUE),
  q3 = sample(c("Alpha","Beta"),n,replace=TRUE),
  w3 = round(runif(n,100,200)),
  y3 = round(runif(n,100,200))
  )
# Reshape the data.frame, one variable at a time
library(reshape)
d1 <- melt(d, id.vars="id", measure.vars=c("p1","p2","p3","q1","q2","q3"))
d2 <- melt(d, id.vars="id", measure.vars=c("w1","w2","w3","y1","y2","y3"))
d1 = cbind(d1,colsplit(d1$variable,names=c("var","ptnum")))
d2 = cbind(d2,colsplit(d2$variable,names=c("var","ptnum")))
d1$variable = NULL
d2$variable = NULL
d1c = cast(d1,...~var)
d2c = cast(d2,...~var)
# Join the two data.frames
d3 = merge(d1c, d2c, by=c("id","ptnum"), all=TRUE)

-

最后的想法:我对这个问题的动机是了解除了stats :: reshape函数之外的reshape包的替代方法。目前,我已得出以下结论:

  • 坚持使用stats :: reshape。只要您记得使用列表而不是简单的向量来表示“变化”参数,您就可以避免麻烦。对于较小的数据集 - 这次我总共处理了几千个变量少于200个病例的病例 - 这个函数的较低速度值得代码的简单性。

  • 要在Hadley Wickham的reshape(或reshape2)包中使用强制转换/融合方法,您必须将变量拆分为两组,一组由数字变量和另一组字符变量组成。当你的数据集足够大以至于你发现stats :: reshape难以忍受时,我想将你的变量分成两组的额外步骤似乎并不那么糟糕。

2 个答案:

答案 0 :(得分:3)

您可以单独处理每个变量, 并加入生成的两个data.frames。

# Sample data
n <- 5
d <- data.frame(
  id = 1:n,
  pt1 = sample(c("M","F"),n,replace=TRUE),
  wt1 = round(runif(n,100,200)),
  pt2 = sample(c("M","F"),n,replace=TRUE),
  wt2 = round(runif(n,100,200)),
  pt3 = sample(c("M","F"),n,replace=TRUE),
  wt3 = round(runif(n,100,200))
)
# Reshape the data.frame, one variable at a time
library(reshape2)
d1 <- melt(d, 
  id.vars="id", measure.vars=c("pt1","pt2","pt3"), 
  variable.name="patient", value.name="gender"
)
d2 <- melt(d, 
  id.vars="id", measure.vars=c("wt1","wt2","wt3"), 
  variable.name="patient", value.name="weight"
)
d1$patient <- as.numeric(gsub("pt", "", d1$patient))
d2$patient <- as.numeric(gsub("wt", "", d1$patient))
# Join the two data.frames
merge(d1, d2, by=c("id","patient"), all=TRUE)

答案 1 :(得分:2)

我认为stats包中的reshape函数最简单。这是一个简单的例子,这可以做你想要的吗?

> tmp
  id val val2 cat
1  1   1   14   a
2  1   2   13   b
3  2   3   12   b
4  2   4   11   a
> tmp2 <- tmp
> tmp2$t <- ave(tmp2$val, tmp2$id, FUN=seq_along)
> tmp2
  id val val2 cat t
1  1   1   14   a 1
2  1   2   13   b 2
3  2   3   12   b 1
4  2   4   11   a 2
> reshape(tmp2, idvar='id', timevar='t', direction='wide')
  id val.1 val2.1 cat.1 val.2 val2.2 cat.2
1  1     1     14     a     2     13     b
3  2     3     12     b     4     11     a

希望您的患者性别不会改变每次预约,但可能会有其他分类变量在访问之间发生变化