我有3个要合并/合并的数据框。我已经尝试了以下两种解决方案: Merge multiple data.frames in R with varying row length,Merge data.frames with duplicates。但是,输出数据表不是我想要的。
这是我的数据框的示例代码:
df1 <- data.frame(FzL = c(594.4014, 594.4147, 594.4148, 594.4194, 594.3877, 618.8600), task = c("hop", "hop", "hop", "vj", "vj", "vj"),
limb = c("L", "L", "L", "R", "R", "R"), trial = c("trial1", "trial1", "trial1", "trial2", "trial2", "trial2"))
df2 <- data.frame(FzR = c(594.2836, 619.1613, 618.8364, 594.4196, 694.3853, 640.2640), task = c("hop", "hop", "hop", "vj", "vj", "vj"),
limb = c("L", "L", "L", "R", "R", "R"), trial = c("trial1", "trial1", "trial1", "trial2", "trial2", "trial2"))
df3 <- data.frame(Frame = c(219388, 219389, 219390, 211387, 211388, 211389), Time = c("2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39",
"2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39"),
task = c("hop", "hop", "hop", "vj", "vj", "vj"), limb = c("L", "L", "L", "R", "R", "R"), trial = c("trial1", "trial1", "trial1", "trial2", "trial2", "trial2"))
当尝试与此代码合并时:
JOIN <- merge(df3, merge(df1, df2, by = c("task", "limb", "trial"), all = TRUE), by = c("task", "limb", "trial"), all = TRUE)
run.seq <- function(x) as.numeric(ave(paste(x), x, FUN = seq_along))
L <- list(df1, df2, df3)
L2 <- lapply(L, function(x) cbind(x, run.seq = run.seq(x$limb)))
out <- Reduce(function(...) merge(..., all = TRUE), L2)
我的最终数据表应包含7列:任务,肢体,试验,FzL,FzR,帧,时间。
任何帮助将不胜感激!谢谢。
答案 0 :(得分:2)
在合并中,函数不知道哪个FzL
值对应哪个FzR
值。结果,它将创建每种可能的组合。
如果数据帧在所有三个数据帧中都以相同的精确顺序(即594.4014的FzL
的df1的第一行对应于FzR
的df2的第一行(例如594.2836),则可以改为绑定列以将它们连接在一起(仅当您确定每一行对应于其他数据框中的相同行时)。
在这种情况下,考虑到此示例中每个数据框中存在相同数量的行和标识符,您可能正在寻找列绑定。
# Base R
df <- cbind(df1,
subset(df2, select = c("FzR")),
subset(df3, select = c("Frame", "Time")))
# Tidyverse
library(dplyr)
df <- df1 %>%
bind_cols(df2 %>% select(FzR)) %>%
bind_cols(df3 %>% select(Frame, Time))
在评论后更新df3具有不同的行数:
另一个选择是仍然合并,但是如果所有数据帧的顺序相同,则可以使用行号显示对应于哪一行的行。这是一种更容易的路由,因为一个数据框的行较少。
library(dplyr)
df1 <- df1 %>%
mutate(id = row_number())
df2 <- df2 %>%
mutate(id = row_number())
df3 <- df3 %>%
mutate(id = row_number())
df <- df1 %>%
full_join(df2) %>%
full_join(df3)
答案 1 :(得分:0)
这是一个稍长的解决方案,因此FzL
和FzR
变量的每个值都对应于给定的行号,并且没有重复的值。使用dplyr
软件包即可完成。
library(dplyr)
df1 <- data.frame(FzL = c(594.4014, 594.4147, 594.4148, 594.4194, 594.3877, 618.8600), task = c("hop", "hop", "hop", "vj", "vj", "vj"),
limb = c("L", "L", "L", "R", "R", "R"), trial = c("trial1", "trial1", "trial1", "trial2", "trial2", "trial2"))
df2 <- data.frame(FzR = c(594.2836, 619.1613, 618.8364, 594.4196, 694.3853, 640.2640), task = c("hop", "hop", "hop", "vj", "vj", "vj"),
limb = c("L", "L", "L", "R", "R", "R"), trial = c("trial1", "trial1", "trial1", "trial2", "trial2", "trial2"))
df3 <- data.frame(Frame = c(219388, 219389, 219390, 211387, 211388, 211389), Time = c("2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39",
"2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39"),
task = c("hop", "hop", "hop", "vj", "vj", "vj"), limb = c("L", "L", "L", "R", "R", "R"), trial = c("trial1", "trial1", "trial1", "trial2", "trial2", "trial2"))
df4 <- df1 %>%
left_join(df2, by = c("FzL" = "FzR"))
df4 <- df4[,-c(5:7)]
df4 <- df4 %>%
mutate(FzR = df2[ ,1])
df5 <- df4 %>%
left_join(df3, by = c("FzL" = "Frame"))
df5 <- df5[,-c(6:9)]
df5 <- df5 %>%
mutate(Frame = df3[ ,c(1)],
Time = df3[ ,c(2)])
df5 <- df5 %>%
rename(task = task.x, limb = limb.x, trial = trial.x,) %>%
select(task, limb, trial, FzL, FzR, Frame, Time)
df5
输出如下:-
task limb trial FzL FzR Frame Time
1 hop L trial1 594.4014 594.2836 219388 2020-06-05 13:26:39
2 hop L trial1 594.4147 619.1613 219389 2020-06-05 13:26:39
3 hop L trial1 594.4148 618.8364 219390 2020-06-05 13:26:39
4 vj R trial2 594.4194 594.4196 211387 2020-06-05 13:26:39
5 vj R trial2 594.3877 694.3853 211388 2020-06-05 13:26:39
6 vj R trial2 618.8600 640.2640 211389 2020-06-05 13:26:39