我以前在Python中遇到过相同的问题:Convert Interval Outer Join SQL in Python Pandas Dataframe 我想在R中做到这一点。
我正在将Oracle SQL外部间隔连接转换为R。下面是Oracle SQL:
WITH df_interval AS
(SELECT '1' id,
'AAA' interval,
1000 begin,
2000 end
FROM DUAL
UNION ALL
SELECT '1' id,
'BBB' intrvl,
2100 begin,
3000 end
FROM DUAL
UNION ALL
SELECT '2' id,
'CCC' intrvl,
3100 begin,
4000 end
FROM DUAL
UNION ALL
SELECT '2' id,
'DDD' intrvl,
4100 begin,
5000 end
FROM DUAL),
df_point AS
(SELECT '1' id, 'X1' point, 1100 mid FROM DUAL
UNION ALL
SELECT '1' id, 'X2' point, 2050 mid FROM DUAL
UNION ALL
SELECT '1' id, 'X3' point, 3200 mid FROM DUAL
UNION ALL
SELECT '2' id, 'X4' point, 4200 mid FROM DUAL
UNION ALL
SELECT '2' id, 'X5' point, 5500 mid FROM DUAL)
SELECT pt.id,
point,
mid,
interval
FROM df_interval it RIGHT OUTER JOIN df_point pt ON pt.id = it.id AND pt.mid BETWEEN it.begin AND it.end
我希望这样的结果:
ID point mid interval
0 1 X1 1100 AAA
1 1 X2 2050 NaN
2 1 X3 3200 NaN
3 2 X4 4200 DDD
4 2 X5 5500 NaN
赞赏有人可以帮助我吗?
答案 0 :(得分:1)
以下是使用data.table
软件包的选项:
library(data.table)
setDT(df_interval)
setDT(df_point)
df_interval[df_point, on=.(id, begin<=mid, end>=mid),
.(ID=id, point=i.point, mid=i.mid, interval=x.interval)]
输出:
ID point mid interval
1: 1 X1 1100 AAA
2: 1 X2 2050 <NA>
3: 1 X3 3200 <NA>
4: 2 X4 4200 DDD
5: 2 X5 5500 <NA>
数据:
df_interval <- data.frame(id=c(1,1,2,2),
interval=c('AAA','BBB','CCC','DDD'),
begin=c(1000,2100,3100,4100),
end=c(2000,3000,4000,5000))
df_point <- data.frame(id=c(1,1,1,2,2),
point=c('X1','X2','X3','X4','X5'),
mid=c(1100,2050,3200,4200,5500))