我正在努力使用reshape包来寻找一种“强制转换”数据帧但在“value.var”中有两个(或更多)值的方法。
这是我想要实现的一个例子。
df <- data.frame( StudentID = c("x1", "x10", "x2",
"x3", "x4", "x5", "x6", "x7", "x8", "x9"),
StudentGender = c('F', 'M', 'F', 'M', 'F', 'M', 'F', 'M', 'M', 'M'),
ExamenYear = c('2007','2007','2007','2008','2008','2008','2008','2009','2009','2009'),
Exam = c('algebra', 'stats', 'bio', 'algebra', 'algebra', 'stats', 'stats', 'algebra', 'bio', 'bio'),
participated = c('no','yes','yes','yes','no','yes','yes','yes','yes','yes'),
passed = c('no','yes','yes','yes','no','yes','yes','yes','no','yes'),
stringsAsFactors = FALSE)
从df我可以创建以下数据帧:
tx <- ddply(df, c('ExamenYear','StudentGender'), summarize,
participated = sum(participated == "yes"),
passed = sum(passed == "yes"))
在重塑逻辑中,我有两个“值变量”参与并传递
我正在寻找在一个数据框中合并以下信息的方法:
dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'participated')
dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'passed')
我想要创建的结束表看起来像这样
tempTab1 <- dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'participated')
tempTab2 <- dcast(tx, formula = ExamenYear ~ StudentGender, value.var = 'passed')
as.data.frame(cbind(ExamenYear = tempTab1[,1],
Female_Participated = tempTab1[,2],
Female_Passed = tempTab2[,2],
Male_Participated = tempTab1[,3],
Male_Passed = tempTab2[,3]
))
在投射函数中是否可以有两个“值变量”?
答案 0 :(得分:11)
由于您已经做到这一点,为什么不melt
您的tx
对象并使用dcast
,如下所示:
dcast(melt(tx, id.vars=c(1, 2)), ExamenYear ~ StudentGender + variable)
# ExamenYear F_participated F_passed M_participated M_passed
# 1 2007 1 1 1 1
# 2 2008 1 1 2 2
# 3 2009 NA NA 3 2
然而,更直接的方法可能是从一开始就melt
您的数据:
df.m <- melt(df, id.vars=c(1:4))
dcast(df.m, ExamenYear ~ StudentGender + variable,
function(x) sum(x == "yes"))
# ExamenYear F_participated F_passed M_participated M_passed
# 1 2007 1 1 1 1
# 2 2008 1 1 2 2
# 3 2009 0 0 3 2
虽然所需的代码不是“漂亮”,但在基础R中执行此操作也不是太困难。这是一种方法:
使用aggregate()
从您的示例中获取tx
。
dfa <- aggregate(cbind(participated, passed) ~
ExamenYear + StudentGender, df, function(x) sum(x == "yes"))
dfa
# ExamenYear StudentGender participated passed
# 1 2007 F 1 1
# 2 2008 F 1 1
# 3 2007 M 1 1
# 4 2008 M 2 2
# 5 2009 M 3 2
使用reshape
将dfa
从“long”转换为“wide”。
reshape(dfa, direction = "wide",
idvar="ExamenYear", timevar="StudentGender")
# ExamenYear participated.F passed.F participated.M passed.M
# 1 2007 1 1 1 1
# 2 2008 1 1 2 2
# 5 2009 NA NA 3 2