我正在使用类似于此的数据框:
df<-data.frame(student=c(rep(1,5),rep(2,5)), month=c(1:5,1:5),
quiz1p1=seq(20,20.9,0.1),quiz1p2=seq(30,30.9,0.1),
quiz2p1=seq(80,80.9,0.1),quiz2p2=seq(90,90.9,0.1))
print(df)
student month quiz1p1 quiz1p2 quiz2p1 quiz2p2
1 1 1 20.0 30.0 80.0 90.0
2 1 2 20.1 30.1 80.1 90.1
3 1 3 20.2 30.2 80.2 90.2
4 1 4 20.3 30.3 80.3 90.3
5 1 5 20.4 30.4 80.4 90.4
6 2 1 20.5 30.5 80.5 90.5
7 2 2 20.6 30.6 80.6 90.6
8 2 3 20.7 30.7 80.7 90.7
9 2 4 20.8 30.8 80.8 90.8
10 2 5 20.9 30.9 80.9 90.9
描述学生在五个月内收到的成绩 - 两次测验分为两部分。
我需要将两个测验分成不同的行 - 这样每个月的每个学生都会有两行,每个测验一个,两个列 - 测验的每个部分。 当我融化桌子时:
melt.data.frame(df, c("student", "month"))
我也将测验的两个部分放在不同的行中。
dcast(dfL,student+month~variable)
当然让我回到我开始的地方,我无法找到将表格重新投入所需形式的方法。 有没有办法使熔化命令功能像:
melt.data.frame(df, measure.var1=c("quiz1p1","quiz2p1"),
measure.var2=c("quiz1p2","quiz2p2"))
答案 0 :(得分:11)
以下是使用reshape()
从基础R:
df2 <- reshape(df, direction="long",
idvar = 1:2, varying = list(c(3,5), c(4,6)),
v.names = c("p1", "p2"), times = c("quiz1", "quiz2"))
## Checking the output
rbind(head(df2, 3), tail(df2, 3))
# student month time p1 p2
# 1.1.quiz1 1 1 quiz1 20.0 30.0
# 1.2.quiz1 1 2 quiz1 20.1 30.1
# 1.3.quiz1 1 3 quiz1 20.2 30.2
# 2.3.quiz2 2 3 quiz2 80.7 90.7
# 2.4.quiz2 2 4 quiz2 80.8 90.8
# 2.5.quiz2 2 5 quiz2 80.9 90.9
您还可以使用idvar
和varying
的列名(而不是列号)。它更冗长,但对我来说似乎更好的做法:
## The same operation as above, using just column *names*
df2 <- reshape(df, direction="long", idvar=c("student", "month"),
varying = list(c("quiz1p1", "quiz2p1"),
c("quiz1p2", "quiz2p2")),
v.names = c("p1", "p2"), times = c("quiz1", "quiz2"))
答案 1 :(得分:7)
我认为这可以满足您的需求:
#Break variable into two columns, one for the quiz and one for the part of the quiz
dfL <- transform(dfL, quiz = substr(variable, 1,5),
part = substr(variable, 6,7))
#Adjust your dcast call:
dcast(dfL, student + month + quiz ~ part)
#-----
student month quiz p1 p2
1 1 1 quiz1 20.0 30.0
2 1 1 quiz2 80.0 90.0
3 1 2 quiz1 20.1 30.1
...
18 2 4 quiz2 80.8 90.8
19 2 5 quiz1 20.9 30.9
20 2 5 quiz2 80.9 90.9
答案 2 :(得分:3)
半年前有一个非常相似question的问题,我在其中写了以下函数:
melt.wide = function(data, id.vars, new.names) {
require(reshape2)
require(stringr)
data.melt = melt(data, id.vars=id.vars)
new.vars = data.frame(do.call(
rbind, str_extract_all(data.melt$variable, "[0-9]+")))
names(new.vars) = new.names
cbind(data.melt, new.vars)
}
您可以使用该功能“融化”您的数据,如下所示:
dfL <-melt.wide(df, id.vars=1:2, new.names=c("Quiz", "Part"))
head(dfL)
# student month variable value Quiz Part
# 1 1 1 quiz1p1 20.0 1 1
# 2 1 2 quiz1p1 20.1 1 1
# 3 1 3 quiz1p1 20.2 1 1
# 4 1 4 quiz1p1 20.3 1 1
# 5 1 5 quiz1p1 20.4 1 1
# 6 2 1 quiz1p1 20.5 1 1
tail(dfL)
# student month variable value Quiz Part
# 35 1 5 quiz2p2 90.4 2 2
# 36 2 1 quiz2p2 90.5 2 2
# 37 2 2 quiz2p2 90.6 2 2
# 38 2 3 quiz2p2 90.7 2 2
# 39 2 4 quiz2p2 90.8 2 2
# 40 2 5 quiz2p2 90.9 2 2
一旦数据处于这种形式,您就可以更轻松地使用dcast()
来获得您想要的任何形式。例如
head(dcast(dfL, student + month + Quiz ~ Part))
# student month Quiz 1 2
# 1 1 1 1 20.0 30.0
# 2 1 1 2 80.0 90.0
# 3 1 2 1 20.1 30.1
# 4 1 2 2 80.1 90.1
# 5 1 3 1 20.2 30.2
# 6 1 3 2 80.2 90.2