我有两个不同大小的数据帧A和B,我试图根据特定条件实现左连接或合并数据帧。任何人都可以帮助我如何在R中连接两个表。我使用a1,a2和b1,b2加入两个数据帧?
df A
a1 a2 a3 a4
1 1 2017-04-25 2017-05-24
1 1 2017-05-25 2017-06-24
2 3 2017-04-25 2017-05-24
3 4 2017-04-25 2017-05-24
4 5 2017-04-25 2017-05-24
4 5 2017-05-25 2017-06-24
4 7 2017-04-25 2017-05-24
5 8 2017-04-25 2017-05-24
5 8 2017-05-25 2017-06-24
df B
b1 b2 b3 b4 b5
1 1 2017-04-20 2017-05-02 M
2 3 2017-03-27 2017-05-19 A
3 4 2017-04-20 2017-05-22 B
4 5 2017-04-21 2017-05-12 N
4 7 2017-05-02 2017-05-09 L
5 8 2017-05-15 2017-05-04 U
第一个数据框的维度
> dim(A)
[1] 506335 5
第二个数据框的尺寸
> dim(B)
[1] 716776 6
tried below left join in R
left_join(A, B, a1=b1, a2 = b2, a3 > b3 , a4 < b4)
错误:
Error in common_by(by, x, y) : object 'b3' not found
Tried merge operation operation but getting below error
merge(A,B,by=c("a1","a2", "a3 > b3" , "a4 < b4"))
错误:
Error in ungroup_grouped_df(x) :
object 'dplyr_ungroup_grouped_df' not found
答案 0 :(得分:2)
从我收集到的你想要
1-通过前两列合并DF
2-过滤满足该条件的DF a3> b3,a4&lt; B4
require(dplyr)
DF <- left_join(A,B, a1=b1, a2=b2) %>% filter(a3 > b3 , a4 < b4)
答案 1 :(得分:1)
正如Andrew Gustar评论的那样,您正在尝试合并和过滤同时进行。相反,首先进行合并,然后进行过滤。它看起来像你正在使用日期,所以他们需要正确格式化。
以下代码都可以在一个链条中执行,但我已将其分解以便于理解。
例如,使用tidyverse SGSTAmount
和IGSTAmount
包:
dplyr
返回:
lubridate
请注意,library(dplyr)
library(lubridate)
# load in your data
textA <- "a1 a2 a3 a4
1 1 2017-04-25 2017-05-24
1 1 2017-05-25 2017-06-24
2 3 2017-04-25 2017-05-24
3 4 2017-04-25 2017-05-24
4 5 2017-04-25 2017-05-24
4 5 2017-05-25 2017-06-24
4 7 2017-04-25 2017-05-24
5 8 2017-04-25 2017-05-24
5 8 2017-05-25 2017-06-24"
textB <- "b1 b2 b3 b4 b5
1 1 2017-04-20 2017-05-02 M
2 3 2017-03-27 2017-05-19 A
3 4 2017-04-20 2017-05-22 B
4 5 2017-04-21 2017-05-12 N
4 7 2017-05-02 2017-05-09 L
5 8 2017-05-15 2017-05-04 U"
# make dataframes
dfA <- read.table(text = textA, header = T)
dfB <- read.table(text = textB , header = T)
# now do the merging - when merging on more than one column, combine them using c
dfout <- left_join(x = dfA, y = dfB, by = c("a1" = "b1", "a2" = "b2"))
# now switch your a3, a4, b3, and b4 columns to dates format using the ymd function
dfout <- dfout %>% mutate_at(vars(a3:b4), ymd)
# finally the filtering
dfout <- dfout %>% filter(a3 > b3)
上的再次过滤(使用下面的代码)会返回一个包含0行的数据帧。
a1 a2 a3 a4 b3 b4 b5
1 1 1 2017-04-25 2017-05-24 2017-04-20 2017-05-02 M
2 1 1 2017-05-25 2017-06-24 2017-04-20 2017-05-02 M
3 2 3 2017-04-25 2017-05-24 2017-03-27 2017-05-19 A
4 3 4 2017-04-25 2017-05-24 2017-04-20 2017-05-22 B
5 4 5 2017-04-25 2017-05-24 2017-04-21 2017-05-12 N
6 4 5 2017-05-25 2017-06-24 2017-04-21 2017-05-12 N
7 5 8 2017-05-25 2017-06-24 2017-05-15 2017-05-04 U