Question

以下是获取样本数据集的代码：

set.seed(0)
practice <- matrix(sample(1:100, 20), ncol = 2)
data <- as.data.frame(practice)
data <- cbind( lob = sprintf("objective%d", rep(1:2,each=5)), data)
data <- cbind( student = sprintf("student%d", rep(1:5,2)), data)
names(data) <- c("student", "learning objective","attempt", "score")
data[-8,]

数据如下所示：

    student learning objective attempt score
1  student1         objective1      90     6
2  student2         objective1      27    19
3  student3         objective1      37    16
4  student4         objective1      56    60
5  student5         objective1      88    34
6  student1         objective2      20    66
7  student2         objective2      85    42
9  student4         objective2      61    82
10 student5         objective2      58    31

我想要的是：

    student       objective1         objective2 
                 attempt  score     attempt score
1  student1         90     6          20      66
2  student2         27    19          85      42
3  student3         ...                0       0
4  student4         ...                  ...
5  student5         ...                  ...

有70个学习目标，因此复制和粘贴尝试和分数会很繁琐，所以我想知道是否有更好的方法来清理数据。

R：我试图使用R中的melt函数来获取新数据，但效果不佳。某些学生缺少分数，并且没有列出学生姓名，例如student3，所以我不能只是cbind分数。

Excel：有70个学习目标，由于缺少名称，我必须检查VLOOKUP的所有70个目标的所有相应行：

(=VLOOKUP($C7,'0learning.csv'!$B$372:$G$395,5,0)
(=VLOOKUP($C7,'0learning.csv'!$B$372:$G$395,6,0)

有更好的方法吗？

Answer 1

我们可以使用data.table的devel版本，即v1.9.5，它可以占用多个value.var列，并重塑“{1}}长＆＃39;形式广泛的＆＃39;。安装说明为here。

 library(data.table)#v1.9.5+
 names(data)[2] <- 'objective'
 dcast(setDT(data), student~objective, value.var=c('attempt', 'score'))
 #    student attempt_objective1 attempt_objective2 score_objective1
 #1: student1                 90                 20                6
 #2: student2                 27                 85               19
 #3: student3                 37                 96               16
 #4: student4                 56                 61               60
 #5: student5                 88                 58               34
 #    score_objective2
 #1:               66
 #2:               42
 #3:               87
 #4:               82
 #5:               31

或使用reshape

中的base R

 reshape(data, idvar='student', timevar='objective', direction='wide')
 #  student attempt.objective1 score.objective1 attempt.objective2
 #  1 student1                 90                6                 20
 #  2 student2                 27               19                 85
 #  3 student3                 37               16                 96
 #  4 student4                 56               60                 61
 #  5 student5                 88               34                 58
 #    score.objective2
 #  1               66
 #  2               42
 #  3               87
 #  4               82
 #  5               31

如何在R或Excel中重塑数据框？

1 个答案: