通过R中的3个公共列合并多个数据框

时间:2020-06-25 04:30:11

标签: r join merge duplicates

我有3个要合并/合并的数据框。我已经尝试了以下两种解决方案: Merge multiple data.frames in R with varying row lengthMerge data.frames with duplicates。但是,输出数据表不是我想要的。

这是我的数据框的示例代码:

df1 <- data.frame(FzL = c(594.4014, 594.4147, 594.4148, 594.4194, 594.3877, 618.8600), task = c("hop", "hop", "hop", "vj", "vj", "vj"), 
                    limb = c("L", "L", "L", "R", "R", "R"), trial = c("trial1", "trial1", "trial1", "trial2", "trial2", "trial2"))
df2 <- data.frame(FzR = c(594.2836, 619.1613, 618.8364, 594.4196, 694.3853, 640.2640), task = c("hop", "hop", "hop", "vj", "vj", "vj"), 
                    limb = c("L", "L", "L", "R", "R", "R"), trial = c("trial1", "trial1", "trial1", "trial2", "trial2", "trial2"))
df3 <- data.frame(Frame = c(219388, 219389, 219390, 211387, 211388, 211389), Time = c("2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39",
       "2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39"),
       task = c("hop", "hop", "hop", "vj", "vj", "vj"), limb = c("L", "L", "L", "R", "R", "R"), trial = c("trial1", "trial1", "trial1", "trial2", "trial2", "trial2"))

当尝试与此代码合并时:

 JOIN <- merge(df3, merge(df1, df2, by = c("task", "limb", "trial"), all = TRUE), by = c("task", "limb", "trial"), all = TRUE)

我得到一个表,该表多次重复行。enter image description here 我也尝试过以下代码:

run.seq <- function(x) as.numeric(ave(paste(x), x, FUN = seq_along))

L <- list(df1, df2, df3)
L2 <- lapply(L, function(x) cbind(x, run.seq = run.seq(x$limb)))

out <- Reduce(function(...) merge(..., all = TRUE), L2)

但是,它只给我前三行,而不能遍历整个数据集。enter image description here

我的最终数据表应包含7列:任务,肢体,试验,FzL,FzR,帧,时间。

任何帮助将不胜感激!谢谢。

2 个答案:

答案 0 :(得分:2)

在合并中,函数不知道哪个FzL值对应哪个FzR值。结果,它将创建每种可能的组合。

如果数据帧在所有三个数据帧中都以相同的精确顺序(即594.4014的FzL的df1的第一行对应于FzR的df2的第一行(例如594.2836),则可以改为绑定列以将它们连接在一起(仅当您确定每一行对应于其他数据框中的相同行时)。

在这种情况下,考虑到此示例中每个数据框中存在相同数量的行和标识符,您可能正在寻找列绑定。

# Base R
df <- cbind(df1,
            subset(df2, select = c("FzR")),
            subset(df3, select = c("Frame", "Time")))

# Tidyverse
library(dplyr)
df <- df1 %>% 
  bind_cols(df2 %>% select(FzR)) %>% 
  bind_cols(df3 %>% select(Frame, Time))

在评论后更新df3具有不同的行数:

另一个选择是仍然合并,但是如果所有数据帧的顺序相同,则可以使用行号显示对应于哪一行的行。这是一种更容易的路由,因为一个数据框的行较少。

library(dplyr)

df1 <- df1 %>% 
  mutate(id = row_number())
df2 <- df2 %>% 
  mutate(id = row_number())
df3 <- df3 %>% 
  mutate(id = row_number())

df <- df1 %>% 
  full_join(df2) %>% 
  full_join(df3)

答案 1 :(得分:0)

这是一个稍长的解决方案,因此FzLFzR变量的每个值都对应于给定的行号,并且没有重复的值。使用dplyr软件包即可完成。

library(dplyr)
df1 <- data.frame(FzL = c(594.4014, 594.4147, 594.4148, 594.4194, 594.3877, 618.8600), task = c("hop", "hop", "hop", "vj", "vj", "vj"), 
                  limb = c("L", "L", "L", "R", "R", "R"), trial = c("trial1", "trial1", "trial1", "trial2", "trial2", "trial2"))
df2 <- data.frame(FzR = c(594.2836, 619.1613, 618.8364, 594.4196, 694.3853, 640.2640), task = c("hop", "hop", "hop", "vj", "vj", "vj"), 
                  limb = c("L", "L", "L", "R", "R", "R"), trial = c("trial1", "trial1", "trial1", "trial2", "trial2", "trial2"))
df3 <- data.frame(Frame = c(219388, 219389, 219390, 211387, 211388, 211389), Time = c("2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39",
                                                                                      "2020-06-05 13:26:39", "2020-06-05 13:26:39", "2020-06-05 13:26:39"),
                  task = c("hop", "hop", "hop", "vj", "vj", "vj"), limb = c("L", "L", "L", "R", "R", "R"), trial = c("trial1", "trial1", "trial1", "trial2", "trial2", "trial2"))

df4 <- df1 %>% 
    left_join(df2, by = c("FzL" = "FzR"))
df4 <- df4[,-c(5:7)]
df4 <- df4 %>% 
    mutate(FzR = df2[ ,1])

df5 <- df4 %>% 
    left_join(df3, by = c("FzL" = "Frame"))
df5 <- df5[,-c(6:9)]
df5 <- df5 %>% 
    mutate(Frame = df3[ ,c(1)],
           Time = df3[ ,c(2)])
df5 <- df5 %>% 
    rename(task = task.x, limb = limb.x, trial = trial.x,) %>% 
    select(task, limb, trial, FzL, FzR, Frame, Time)
df5

输出如下:-

task   limb  trial      FzL      FzR  Frame                Time
1  hop    L trial1 594.4014 594.2836 219388 2020-06-05 13:26:39
2  hop    L trial1 594.4147 619.1613 219389 2020-06-05 13:26:39
3  hop    L trial1 594.4148 618.8364 219390 2020-06-05 13:26:39
4   vj    R trial2 594.4194 594.4196 211387 2020-06-05 13:26:39
5   vj    R trial2 594.3877 694.3853 211388 2020-06-05 13:26:39
6   vj    R trial2 618.8600 640.2640 211389 2020-06-05 13:26:39