合并数据帧,以便将一个数据帧中的值插入另一个匹配的行号中

时间:2019-03-07 05:41:22

标签: r

我想以某种方式更改数据集的格式。说我有一个数据列表,指示参与者何时以及多少次参加了辅导会议。在十二周的时间内,他们最多可以参加三节课。说他们的数据如此记录

set.seed(01234)
df1 <- data.frame(id = rep(LETTERS[1:4], each = 3),
                  session = rep(paste0("session", 1:3), length.out = 12),
                  week1 = c(sort(sample(1:12, 3, replace = F)), 
                           sort(sample(1:12, 3, replace = F)), 
                           sort(sample(1:12, 3, replace = F)), 
                           sort(sample(1:12, 3, replace = F)))) 
df1$week1[c(3,8,9,12)] <- NA # insert some NAs representing sessions that weren't attended

数据集看起来像这样

#    id  session week1
# 1   A session1     2
# 2   A session2     7
# 3   A session3    NA
# 4   B session1     7
# 5   B session2     8
# 6   B session3    10
# 7   C session1     1
# 8   C session2    NA
# 9   C session3    NA
# 10  D session1     6
# 11  D session2     7
# 12  D session3    NA

但是我想要一个很长的数据集,每个人本可以参加的十二个星期中的每一个人都有一行,就像这样

df2 <- data.frame(id = rep(LETTERS[1:4], each = 12),
                  week2 = rep(1:12, times = 4))

因此参与者A的数据如下

df2[1:12,]

#    id week2
# 1   A     1
# 2   A     2
# 3   A     3
# 4   A     4
# 5   A     5
# 6   A     6
# 7   A     7
# 8   A     8
# 9   A     9
# 10  A    10
# 11  A    11
# 12  A    12

我想以某种方式将两者合并,以使df1的week1列中的数字与df2中的相应行匹配,理想情况下是这样的(示例仅适用于参与者A)

data.frame(id = rep("A", 12),
           week = 1:12,
           attended = c(0,1,0,0,0,0,1,0,0,0,0,0))

#    id week attended
# 1   A    1        0
# 2   A    2        1
# 3   A    3        0
# 4   A    4        0
# 5   A    5        0
# 6   A    6        0
# 7   A    7        1
# 8   A    8        0
# 9   A    9        0
# 10  A   10        0
# 11  A   11        0
# 12  A   12        0

3 个答案:

答案 0 :(得分:1)

do.call(rbind, lapply(split(df2, df2$id), function(x){
    x$attended = as.integer(x$week2 %in% df1$week1[df1$id == x$id[1]])
    x
}))

答案 1 :(得分:1)

一种利用合并的方法:

# merge the 2 dataframes
names(df2)[2] <- "week"
names(df1)[3] <- "week"
df <- merge(df2, df1, by=c("id", "week"), all.x=T)

# replace 'session' with 1s and 0s
df$session <- !is.na(df$session)

答案 2 :(得分:1)

您可以使用(duplicate ID with different date)扩展原始data.frame,因此无需合并,只需将tidyr::complete定义为具有正确级别数的因子:

week1