我有一份数据清单,表明参加这样的会议:
Event Participant
ConferenceA John
ConferenceA Joe
ConferenceA Mary
ConferenceB John
ConferenceB Ted
ConferenceC Jessica
我想创建一个以下格式的二元指标考勤矩阵:
Event John Joe Mary Ted Jessica
ConferenceA 1 1 1 0 0
ConferenceB 1 0 0 1 0
ConferenceC 0 0 0 0 1
有没有办法在R中执行此操作?
答案 0 :(得分:10)
假设您的data.frame
被称为“mydf”,只需使用table
:
> table(mydf)
Participant
Event Jessica Joe John Mary Ted
ConferenceA 0 1 1 1 0
ConferenceB 0 0 1 0 1
ConferenceC 1 0 0 0 0
如果某人有可能多次参加会议,导致table
返回大于1的值,您只需重新编码大于1到1的所有值,就像这样。
temp <- table(mydf)
temp[temp > 1] <- 1
请注意,这会返回table
。如果您想要返回data.frame
,请使用as.data.frame.matrix
:
> as.data.frame.matrix(table(mydf))
Jessica Joe John Mary Ted
ConferenceA 0 1 1 1 0
ConferenceB 0 0 1 0 1
ConferenceC 1 0 0 0 0
在上文中,“mydf”定义为:
mydf <- structure(list(Event = c("ConferenceA", "ConferenceA",
"ConferenceA", "ConferenceB", "ConferenceB", "ConferenceC"),
Participant = c("John", "Joe", "Mary", "John", "Ted", "Jessica")),
.Names = c("Event", "Participant"), class = "data.frame",
row.names = c(NA, -6L))
请在将来以类似的方式分享您的数据。
答案 1 :(得分:1)
adjmat
函数并拉出布尔矩阵可能会有所帮助。
双倍与会者的数据:
## dat <- read.table(text="Event Participant
## ConferenceA John
## ConferenceA Joe
## ConferenceA Mary
## ConferenceB John
## ConferenceB Ted
## ConferenceB Ted
## ConferenceC Jessica ", header=TRUE)
计数表:
library(qdap)
wfm(dat[, 1], dat[, 2], lower.case = FALSE)
## > wfm(dat[, 1], dat[, 2], lower.case = FALSE)
## Jessica Joe John Mary Ted
## conferenceA 0 1 1 1 0
## conferenceB 0 0 1 0 2
## conferenceC 1 0 0 0 0
使用mtabulate
with(dat, mtabulate(split(Participant, Event)))
## Jessica Joe John Mary Ted
## ConferenceA 0 1 1 1 0
## ConferenceB 0 0 1 0 2
## ConferenceC 1 0 0 0 0
布尔矩阵:
adjmat(wfm(dat[, 1], dat[, 2], lower.case = FALSE))$boolean
## > adjmat(wfm(dat[, 1], dat[, 2], lower.case = FALSE))$boolean
## Jessica Joe John Mary Ted
## conferenceA 0 1 1 1 0
## conferenceB 0 0 1 0 1
## conferenceC 1 0 0 0 0
答案 2 :(得分:0)
另一种baseR方式,使用函数xtabs
xtabs(~mydf$Event+mydf$Participant)
mydf$Participant
mydf$Event Jessica Joe John Mary Ted
ConferenceA 0 1 1 1 0
ConferenceB 0 0 1 0 1
ConferenceC 1 0 0 0 0
#using data
mydf <- structure(list(Event = c("ConferenceA", "ConferenceA",
"ConferenceA", "ConferenceB", "ConferenceB", "ConferenceC"),
Participant = c("John", "Joe", "Mary", "John", "Ted", "Jessica")),
.Names = c("Event", "Participant"), class = "data.frame",
row.names = c(NA, -6L))