我想以某种方式更改数据集的格式。说我有一个数据列表,指示参与者何时以及多少次参加了辅导会议。在十二周的时间内,他们最多可以参加三节课。说他们的数据如此记录
set.seed(01234)
df1 <- data.frame(id = rep(LETTERS[1:4], each = 3),
session = rep(paste0("session", 1:3), length.out = 12),
week1 = c(sort(sample(1:12, 3, replace = F)),
sort(sample(1:12, 3, replace = F)),
sort(sample(1:12, 3, replace = F)),
sort(sample(1:12, 3, replace = F))))
df1$week1[c(3,8,9,12)] <- NA # insert some NAs representing sessions that weren't attended
数据集看起来像这样
# id session week1
# 1 A session1 2
# 2 A session2 7
# 3 A session3 NA
# 4 B session1 7
# 5 B session2 8
# 6 B session3 10
# 7 C session1 1
# 8 C session2 NA
# 9 C session3 NA
# 10 D session1 6
# 11 D session2 7
# 12 D session3 NA
但是我想要一个很长的数据集,每个人本可以参加的十二个星期中的每一个人都有一行,就像这样
df2 <- data.frame(id = rep(LETTERS[1:4], each = 12),
week2 = rep(1:12, times = 4))
因此参与者A的数据如下
df2[1:12,]
# id week2
# 1 A 1
# 2 A 2
# 3 A 3
# 4 A 4
# 5 A 5
# 6 A 6
# 7 A 7
# 8 A 8
# 9 A 9
# 10 A 10
# 11 A 11
# 12 A 12
我想以某种方式将两者合并,以使df1的week1
列中的数字与df2中的相应行匹配,理想情况下是这样的(示例仅适用于参与者A)
data.frame(id = rep("A", 12),
week = 1:12,
attended = c(0,1,0,0,0,0,1,0,0,0,0,0))
# id week attended
# 1 A 1 0
# 2 A 2 1
# 3 A 3 0
# 4 A 4 0
# 5 A 5 0
# 6 A 6 0
# 7 A 7 1
# 8 A 8 0
# 9 A 9 0
# 10 A 10 0
# 11 A 11 0
# 12 A 12 0
答案 0 :(得分:1)
do.call(rbind, lapply(split(df2, df2$id), function(x){
x$attended = as.integer(x$week2 %in% df1$week1[df1$id == x$id[1]])
x
}))
答案 1 :(得分:1)
一种利用合并的方法:
# merge the 2 dataframes
names(df2)[2] <- "week"
names(df1)[3] <- "week"
df <- merge(df2, df1, by=c("id", "week"), all.x=T)
# replace 'session' with 1s and 0s
df$session <- !is.na(df$session)
答案 2 :(得分:1)
您可以使用(duplicate ID with different date)
扩展原始data.frame,因此无需合并,只需将tidyr::complete
定义为具有正确级别数的因子:
week1