Question

我想使用R中的一些索引将R Dataframe因子转换为指标变量。

给出以下表示

StudentID  Subject
1          A  
1          B 
2          A
2          C
3          A 
3          B

我需要使用StudentID作为索引进行以下表示

StudentID  SubjectA SubjectB SubjectC
1           1         1       0
2           1         0       1 
3           1         1       0

Answer 1

我们可以使用table

table(df1)
#            Subject
#StudentID A B C
#        1 1 1 0
#        2 1 0 1
#        3 1 1 0

如果我们需要data.frame

as.data.frame.matrix(table(df1))

Answer 2

以下是我如何使用dcast中的reshape2按照上述评论中的建议获得

library(reshape2)

ID <- c(1, 1, 2, 2, 3, 3)
Subject <- c('A', 'B', 'A', 'C', 'A', 'B')

data <- data.frame(ID, Subject)
data <- dcast(data, ID ~ Subject)

data[is.na(data)] <- 0

f <- function(x) {
  x <- gsub('[A-Z]', 1, x)
}

as.data.frame(apply(data, 2, f))
#  ID A B C
#1  1 1 1 0
#2  2 1 0 1
#3  3 1 1 0

Answer 3

现在我看一下这个解决方案，可能效率不高。但它比其他一些解决方案更具动态性。可能还有一种方法可以直接使用data.table，但我无法弄明白。这可能会有所帮助：

library(data.table)

df <- structure(list(StudentID = c(1, 1, 2, 2, 3, 3), 
                     Subject = structure(c(1L, 
      2L, 1L, 3L, 1L, 2L), .Label = c("A", "B", "C"), class = "factor")), .Names = c("StudentID", 
       "Subject"), row.names = c(NA, -6L), class = "data.frame")

df <- data.table(df)
### here we pull the unique student id's to use in group by
studentid <- as.character(unique(df$Subject))
### here we group by student ID's and paste which Subjects exist 
x <- df[,list("Values"=paste(Subject,collapse="_")),by=StudentID]

### then we go through each one and try to match it to the unique vector
tmp <- strsplit(x$Values,"_")
res <- do.call(rbind,lapply(tmp,function(i) match(studentid,i)))
### change the results to the indicator variable desired
res[!is.na(res)] <- 1
res[is.na(res)] <- 0

res <- data.frame("StudentID"=x$StudentID,res)
colnames(res) <- c("StudentID",studentid)

如何使用索引将R Dataframe因子转换为Indicator Varible

3 个答案: