我有一些数据要重塑 R ,但无法弄清楚如何。这是一个场景:我有来自不同学校的一些学生的考试成绩数据。以下是一些示例数据:
#Create example data:
test <- data.frame("score" = c(1,10,20,40,20), "schoolid" = c(1,1,2,2,3))
产生如下数据格式:
score schoolid
1 1
10 1
20 2
40 2
20 3
因此,有学校ID识别学校,并且每个学生都有一个考试分数。对于不同程序的分析,我希望以这样的格式获取数据:
Score student 1 Score student 2
School ID == 1 1 10
School ID == 2 10 40
School ID == 3 20 NA
为了重塑数据,我尝试使用reshape2库中的reshape和cast函数,但这导致了错误:
#Reshape function
reshape(test, v.names = test2$score, idvar = test2$schoolid, direction = "wide")
reshape(test, idvar = test$schoolid, direction = "wide")
#Error: in [.data.frame'(data,,idvar): undefined columns selected
#Cast function
cast(test,test$schoolid~test$score)
#Error: Error: could not find function "cast" (although ?cast works fine)
我猜每个学校的考试成绩数量不同,这一事实使重组过程变得复杂。
我如何重塑这些数据以及我应该使用哪种功能?
答案 0 :(得分:4)
以下是一些仅使用R的基础的解决方案。所有三个解决方案都使用这个新的studentno
变量:
studentno <- with(test, ave(schoolid, schoolid, FUN = seq_along))
1)tapply
with(test, tapply(score, list(schoolid, studentno), c))
,并提供:
1 2
1 1 10
2 20 40
3 20 NA
2)重塑
# rename score to student and append studentno column
test2 <- transform(test, student = score, score = NULL, studentno = studentno)
reshape(test2, dir = "wide", idvar = "schoolid", timevar = "studentno")
,并提供:
schoolid student.1 student.2
1 1 1 10
3 2 20 40
5 3 20 NA
3)如果没有得分为0的学生,xtabs xtabs
也会有效。
xt <- xtabs(score ~ schoolid + studentno, test)
xt[xt == 0] <- NA # omit this step if its ok to use 0 in place of NA
xt
,并提供:
studentno
schoolid 1 2
1 1 10
2 20 40
3 20
答案 1 :(得分:2)
您必须在某处定义学生ID,例如:
test <- data.frame("score" = c(1,10,20,40,20), "schoolid" = c(1,1,2,2,3))
test$studentid <- c(1,2,1,2,1)
library(reshape2)
dcast(test, schoolid ~ studentid, value.var="score",mean)
schoolid 1 2
1 1 1 10
2 2 20 40
3 3 20 NaN