如何在R中执行具有两个不同数据帧大小的合并或连接操作

时间:2017-08-17 15:40:38

标签: r dataframe

我有两个不同大小的数据帧A和B,我试图根据特定条件实现左连接或合并数据帧。任何人都可以帮助我如何在R中连接两个表。我使用a1,a2和b1,b2加入两个数据帧?

df A
a1 a2   a3         a4
1  1   2017-04-25  2017-05-24
1  1   2017-05-25  2017-06-24
2  3   2017-04-25  2017-05-24
3  4   2017-04-25  2017-05-24
4  5   2017-04-25  2017-05-24
4  5   2017-05-25  2017-06-24
4  7   2017-04-25  2017-05-24
5  8   2017-04-25  2017-05-24
5  8   2017-05-25  2017-06-24


df B
b1  b2  b3         b4         b5
1   1  2017-04-20  2017-05-02  M
2   3  2017-03-27  2017-05-19  A
3   4  2017-04-20  2017-05-22  B
4   5  2017-04-21  2017-05-12  N
4   7  2017-05-02  2017-05-09  L
5   8  2017-05-15  2017-05-04  U

第一个数据框的维度

> dim(A)
   [1] 506335      5

第二个数据框的尺寸

> dim(B)
[1] 716776      6

tried below left join in R

left_join(A, B, a1=b1, a2 = b2,  a3 > b3 , a4 < b4)

错误:

Error in common_by(by, x, y) : object 'b3' not found

Tried merge operation operation but getting below error
merge(A,B,by=c("a1","a2", "a3 > b3" , "a4 < b4"))

错误:

Error in ungroup_grouped_df(x) : 
      object 'dplyr_ungroup_grouped_df' not found

2 个答案:

答案 0 :(得分:2)

从我收集到的你想要

1-通过前两列合并DF

2-过滤满足该条件的DF a3> b3,a4&lt; B4

require(dplyr)
DF <- left_join(A,B, a1=b1, a2=b2) %>% filter(a3 > b3 , a4 < b4)

答案 1 :(得分:1)

正如Andrew Gustar评论的那样,您正在尝试合并和过滤同时进行。相反,首先进行合并,然后进行过滤。它看起来像你正在使用日期,所以他们需要正确格式化。

以下代码都可以在一个链条中执行,但我已将其分解以便于理解。

例如,使用tidyverse SGSTAmountIGSTAmount包:

dplyr

返回:

lubridate

请注意,library(dplyr) library(lubridate) # load in your data textA <- "a1 a2 a3 a4 1 1 2017-04-25 2017-05-24 1 1 2017-05-25 2017-06-24 2 3 2017-04-25 2017-05-24 3 4 2017-04-25 2017-05-24 4 5 2017-04-25 2017-05-24 4 5 2017-05-25 2017-06-24 4 7 2017-04-25 2017-05-24 5 8 2017-04-25 2017-05-24 5 8 2017-05-25 2017-06-24" textB <- "b1 b2 b3 b4 b5 1 1 2017-04-20 2017-05-02 M 2 3 2017-03-27 2017-05-19 A 3 4 2017-04-20 2017-05-22 B 4 5 2017-04-21 2017-05-12 N 4 7 2017-05-02 2017-05-09 L 5 8 2017-05-15 2017-05-04 U" # make dataframes dfA <- read.table(text = textA, header = T) dfB <- read.table(text = textB , header = T) # now do the merging - when merging on more than one column, combine them using c dfout <- left_join(x = dfA, y = dfB, by = c("a1" = "b1", "a2" = "b2")) # now switch your a3, a4, b3, and b4 columns to dates format using the ymd function dfout <- dfout %>% mutate_at(vars(a3:b4), ymd) # finally the filtering dfout <- dfout %>% filter(a3 > b3) 上的再次过滤(使用下面的代码)会返回一个包含0行的数据帧。

  a1 a2         a3         a4         b3         b4 b5
1  1  1 2017-04-25 2017-05-24 2017-04-20 2017-05-02  M
2  1  1 2017-05-25 2017-06-24 2017-04-20 2017-05-02  M
3  2  3 2017-04-25 2017-05-24 2017-03-27 2017-05-19  A
4  3  4 2017-04-25 2017-05-24 2017-04-20 2017-05-22  B
5  4  5 2017-04-25 2017-05-24 2017-04-21 2017-05-12  N
6  4  5 2017-05-25 2017-06-24 2017-04-21 2017-05-12  N
7  5  8 2017-05-25 2017-06-24 2017-05-15 2017-05-04  U