以下是获取样本数据集的代码:
set.seed(0)
practice <- matrix(sample(1:100, 20), ncol = 2)
data <- as.data.frame(practice)
data <- cbind( lob = sprintf("objective%d", rep(1:2,each=5)), data)
data <- cbind( student = sprintf("student%d", rep(1:5,2)), data)
names(data) <- c("student", "learning objective","attempt", "score")
data[-8,]
数据如下所示:
student learning objective attempt score
1 student1 objective1 90 6
2 student2 objective1 27 19
3 student3 objective1 37 16
4 student4 objective1 56 60
5 student5 objective1 88 34
6 student1 objective2 20 66
7 student2 objective2 85 42
9 student4 objective2 61 82
10 student5 objective2 58 31
我想要的是:
student objective1 objective2
attempt score attempt score
1 student1 90 6 20 66
2 student2 27 19 85 42
3 student3 ... 0 0
4 student4 ... ...
5 student5 ... ...
有70个学习目标,因此复制和粘贴尝试和分数会很繁琐,所以我想知道是否有更好的方法来清理数据。
R:我试图使用R中的melt
函数来获取新数据,但效果不佳。某些学生缺少分数,并且没有列出学生姓名,例如student3
,所以我不能只是cbind
分数。
Excel:有70个学习目标,由于缺少名称,我必须检查VLOOKUP
的所有70个目标的所有相应行:
(=VLOOKUP($C7,'0learning.csv'!$B$372:$G$395,5,0)
(=VLOOKUP($C7,'0learning.csv'!$B$372:$G$395,6,0)
有更好的方法吗?
答案 0 :(得分:4)
我们可以使用data.table
的devel版本,即v1.9.5
,它可以占用多个value.var
列,并重塑“{1}}长&#39;形式广泛的&#39;。安装说明为here
。
library(data.table)#v1.9.5+
names(data)[2] <- 'objective'
dcast(setDT(data), student~objective, value.var=c('attempt', 'score'))
# student attempt_objective1 attempt_objective2 score_objective1
#1: student1 90 20 6
#2: student2 27 85 19
#3: student3 37 96 16
#4: student4 56 61 60
#5: student5 88 58 34
# score_objective2
#1: 66
#2: 42
#3: 87
#4: 82
#5: 31
或使用reshape
base R
reshape(data, idvar='student', timevar='objective', direction='wide')
# student attempt.objective1 score.objective1 attempt.objective2
# 1 student1 90 6 20
# 2 student2 27 19 85
# 3 student3 37 16 96
# 4 student4 56 60 61
# 5 student5 88 34 58
# score.objective2
# 1 66
# 2 42
# 3 87
# 4 82
# 5 31