将类的长数据集转换为宽数据集,其中变量是每个类的伪代码

时间:2017-06-11 20:44:28

标签: r dplyr reshape reshape2

假设我有一个数据集,其中行是人们所使用的类:

attendance <- data.frame(id = c(1, 1, 1, 2, 2),
                         class = c("Math", "English", "Math", "Reading", "Math"))  

I.e.,

     id  class  
   1 1   "Math" 
   2 1   "English"
   3 1   "Math"
   4 2   "Reading"
   5 2   "Math"

我想创建一个新的数据集,其中行是id,变量是类名,如下所示:

class.names <- names(table(attendance$class))
attedance2 <-  matrix(nrow=length(table(attendance$id)), 
                      ncol=length(class.names)) 
colnames(attedance2) <- class.names
attedance2 <- as.data.frame(attedance2)
attedance2$id <- unique(attendance$id)

I.e.,

     English  Math  Reading  id
   1    NA     NA      NA     1
   2    NA     NA      NA     2

我想填写NAs是否该特定id是否接受了该类。它可以是Yes / No,1/0,或类的计数

I.e.,

     English  Math  Reading  id
   1   "Yes"  "Yes"   "No"    1
   2   "No"   "Yes"   "Yes"   2

我熟悉dplyr,所以如果在解决方案中使用它而不是必需的话,对我来说会更容易。谢谢您的帮助!

2 个答案:

答案 0 :(得分:4)

使用:

library(reshape2)
attendance$val <- 'yes'
dcast(unique(attendance), id ~ class, value.var = 'val', fill = 'no')

给出:

  id English Math Reading
1  1     yes  yes      no
2  2      no  yes     yes

使用data.table的类似方法:

library(data.table)
dcast(unique(setDT(attendance))[,val:='yes'], id ~ class, value.var = 'val', fill = 'no')

dplyr / tidyr

library(dplyr)
library(tidyr)
attendance %>% 
  distinct() %>% 
  mutate(var = 'yes') %>% 
  spread(class, var, fill = 'no')

另一个更复杂的选项可能首先重新整形,然后用yesno替换计数(有关dcast的默认聚合选项,请参阅here for an explanation):

 att2 <- dcast(attendance, id ~ class, value.var = 'class')

给出:

  id English Math Reading
1  1       1    2       0
2  2       0    1       1

现在您可以用以下内容替换计数:

# create index which counts are above zero
idx <- att2[,-1] > 0
# replace the non-zero values with 'yes'
att2[,-1][idx] <- 'yes'
# replace the zero values with 'no'
att2[,-1][!idx] <- 'no'

最终给出了:

> att2
  id English Math Reading
1  1     yes  yes      no
2  2      no  yes     yes

答案 1 :(得分:0)

我们可以使用base R

执行此操作
attendance$val <- "yes"
d1 <- reshape(attendance, idvar = 'id', direction = 'wide', timevar = 'class')
d1[is.na(d1)] <- "no"
names(d1) <- sub("val\\.", '', names(d1))
d1
#  id Math English Reading
#1  1  yes     yes      no
#4  2  yes      no     yes

xtabs

xtabs(val ~id + class, transform(unique(attendance), val = 1))
#    class
# id  English Math Reading
#  1       1    1       0
#  2       0    1       1

注意:二进制文件可以很容易地转换为“是”,“没有”,但最好是1/0或TRUE/FALSE