在R中创建二进制指示符矩阵(布尔矩阵)

时间:2013-07-02 17:00:33

标签: r matrix dummy-data

我有一份数据清单,表明参加这样的会议:

Event                     Participant  
ConferenceA               John   
ConferenceA               Joe  
ConferenceA               Mary    
ConferenceB               John  
ConferenceB               Ted  
ConferenceC               Jessica  

我想创建一个以下格式的二元指标考勤矩阵:

Event        John  Joe  Mary  Ted  Jessica  
ConferenceA  1     1    1     0    0  
ConferenceB  1     0    0     1    0  
ConferenceC  0     0    0     0    1  

有没有办法在R中执行此操作?

3 个答案:

答案 0 :(得分:10)

假设您的data.frame被称为“mydf”,只需使用table

> table(mydf)
             Participant
Event         Jessica Joe John Mary Ted
  ConferenceA       0   1    1    1   0
  ConferenceB       0   0    1    0   1
  ConferenceC       1   0    0    0   0

如果某人有可能多次参加会议,导致table返回大于1的值,您只需重新编码大于1到1的所有值,就像这样。

temp <- table(mydf)
temp[temp > 1] <- 1

请注意,这会返回table。如果您想要返回data.frame,请使用as.data.frame.matrix

> as.data.frame.matrix(table(mydf))
            Jessica Joe John Mary Ted
ConferenceA       0   1    1    1   0
ConferenceB       0   0    1    0   1
ConferenceC       1   0    0    0   0

在上文中,“mydf”定义为:

mydf <- structure(list(Event = c("ConferenceA", "ConferenceA", 
  "ConferenceA", "ConferenceB", "ConferenceB", "ConferenceC"), 
  Participant = c("John", "Joe", "Mary", "John", "Ted", "Jessica")), 
  .Names = c("Event", "Participant"), class = "data.frame", 
  row.names = c(NA, -6L))

请在将来以类似的方式分享您的数据。

答案 1 :(得分:1)

@Ananda的答案更好但我认为我会使用qdap抛弃另一种方法。只有在“某人不止一次参加过会议”的情况下,它才会闪耀。

如阿南达所指出的那样,当“有人不止一次参加过会议”时,我提到了一个实例。在这种情况下,使用adjmat函数并拉出布尔矩阵可能会有所帮助。

双倍与会者的数据:

## dat <- read.table(text="Event                     Participant  
## ConferenceA               John   
## ConferenceA               Joe  
## ConferenceA               Mary    
## ConferenceB               John  
## ConferenceB               Ted  
## ConferenceB               Ted
## ConferenceC               Jessica  ", header=TRUE)

计数表:

library(qdap)
wfm(dat[, 1], dat[, 2], lower.case = FALSE)

## > wfm(dat[, 1], dat[, 2], lower.case = FALSE)
##             Jessica Joe John Mary Ted
## conferenceA       0   1    1    1   0
## conferenceB       0   0    1    0   2
## conferenceC       1   0    0    0   0

使用mtabulate

with(dat, mtabulate(split(Participant, Event)))

##             Jessica Joe John Mary Ted
## ConferenceA       0   1    1    1   0
## ConferenceB       0   0    1    0   2
## ConferenceC       1   0    0    0   0

布尔矩阵:

adjmat(wfm(dat[, 1], dat[, 2], lower.case = FALSE))$boolean

## > adjmat(wfm(dat[, 1], dat[, 2], lower.case = FALSE))$boolean
##             Jessica Joe John Mary Ted
## conferenceA       0   1    1    1   0
## conferenceB       0   0    1    0   1
## conferenceC       1   0    0    0   0

答案 2 :(得分:0)

另一种baseR方式,使用函数xtabs

xtabs(~mydf$Event+mydf$Participant)

             mydf$Participant
mydf$Event    Jessica Joe John Mary Ted
  ConferenceA       0   1    1    1   0
  ConferenceB       0   0    1    0   1
  ConferenceC       1   0    0    0   0

#using data
mydf <- structure(list(Event = c("ConferenceA", "ConferenceA", 
                                 "ConferenceA", "ConferenceB", "ConferenceB", "ConferenceC"), 
                       Participant = c("John", "Joe", "Mary", "John", "Ted", "Jessica")), 
                  .Names = c("Event", "Participant"), class = "data.frame", 
                  row.names = c(NA, -6L))