我有2个要合并的数据框。数据集之间的区别在于观察值的数量及其收集的方式。在df1
中记录了2个不同的日期。每条记录都有一个索引,id1人的识别号,id2代表进行记录的天数(天必须不同)。还有一个Day变量,记录进行记录的星期几。
在df2
中,观察结果只是基于序列号和id1个人识别号记录的。每人只有一个观察结果。同样,这里还有一个Day变量,用于记录录制开始的时间。
我想从df2中识别出与df1在同一天记录的观察结果。
我试图创建一个newindex(对index和id1进行分组)以变长并根据天数进行合并。
Df1:-天表示进行观察的时间(例如,索引12; id1-表示仅1个人; id2表示2天-星期三id2 1和星期日id2 2)
index id1 id2 Day obs1 obs2 obs3
12 1 1 Wednesday 1 11 12
12 1 2 Sunday 2 0 0
123 1 1 Tuesday 1 0 1
123 1 2 Saturday 3 0 3
123 2 1 Monday 2 2 4
123 2 2 Saturday 1 0 8
df2:-这里的Day Day变量表示进行观察的起始日期(例如id 12 day2和id 123 day1)
index id1 Day day1 day2 day3 day4 day5 day6 day7
12 1 Tuesday 2 1 2 1 1 3 1
123 1 Friday 0 3 0 3 3 0 3
结果:
index id1 id2 obs1 obs2 obs3
12 1 1 1 11 12
12 1 2 2 0 0
123 1 2 3 0 3
123 2 2 1 0 8
样本数据
df1:
structure(list(index = c(12, 12, 123, 123, 123, 123), id1 = c(1,
1, 1, 1, 2, 2), id2 = c(1, 2, 1, 2, 1, 2), Day = structure(c(5L,
3L, 4L, 2L, 1L, 2L), .Label = c("Monday", "Saturday", "Sunday",
"Tuesday", "Wednesday"), class = "factor"), obs1 = c(1, 2, 1,
3, 2, 1), obs2 = c(11, 0, 0, 0, 2, 0), obs3 = c(12, 0, 1, 3,
4, 8)), class = "data.frame", row.names = c(NA, -6L))
df2:
structure(list(index = c(12, 123), id1 = c(1, 1), Day = structure(2:1, .Label = c("Friday",
"Tuesday"), class = "factor"), day1 = c(2, 0), day2 = c(1, 3),
day3 = c(2, 0), day4 = c(1, 3), day5 = c(1, 3), day6 = c(3,
0), day7 = c(1, 3)), class = "data.frame", row.names = c(NA,
-2L))
答案 0 :(得分:1)
我们可以得到df2
lin长格式,group_by
index
保留观察后发生的行,并基于{{1}将其与df1
合并}和index
。
Day
然后可以使用library(dplyr)
weekday <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday",
"Saturday", "Sunday")
df2 %>%
mutate_at(vars(matches('day\\d+')), as.numeric) %>%
tidyr::pivot_longer(cols = matches('day\\d+')) %>%
group_by(index) %>%
filter(row_number() >= match(Day, weekday)[1L]) %>%
summarise(Day = match(Day, weekday)[1]) %>%
inner_join(df1 %>%mutate(Day = match(Day, weekday)), by = 'index') %>%
filter(Day.y >= Day.x)
# index Day.x id1 id2 Day.y obs1 obs2 obs3
# <dbl> <int> <dbl> <dbl> <int> <dbl> <dbl> <dbl>
#1 12 2 1 1 3 1 11 12
#2 12 2 1 2 7 2 0 0
#3 123 5 1 2 6 3 0 3
#4 123 5 2 2 6 1 0 8
仅保留必需的列。
答案 1 :(得分:1)
来自melt
的{{1}}的选项
data.table
如果数据集是library(data.table)
weekday <- c("Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday")
haven
天,我们首先用labelled
转换为factor
as_factor
或者使用library(haven)
df1$Day <- as.character(as_factor(df1$Day))
df2$Day <- as.character(as_factor(df2$Day))
df1$Day <- match(df1$Day, weekday)
dt2 <- melt(setDT(df2), measure = patterns('^day\\d+$'))[seq_len(.N) >=
match(Day, weekday)[1L]][, .(Day = match(Day, weekday)[1]), index]
merge(setDT(df1), dt2, by = 'index')[Day.y < Day.x]
# index id1 id2 Day.x obs1 obs2 obs3 Day.y
#1: 12 1 1 3 1 11 12 2
#2: 12 1 2 7 2 0 0 2
#3: 123 1 2 6 3 0 3 5
#4: 123 2 2 6 1 0 8 5
,最好先返回tidyverse
中的list
列,然后再返回summarise
(以防长度与行数不匹配)
unnest