我有一个特定的请求要在我的数据库上做(PostgreSQL v9.4.5),我没有在纯SQL中看到任何优雅的解决方案来解决它(我知道我可以用Python或其他方法来做,但是我有几十亿行数据,计算时间会大大增加。)
我有两个表:交易和事件。这些表格都代表一天中订单簿中发生的交易(这就是为什么我有几十亿行,我的数据已经过了几年)但是事件比交易更多 EM>
两个表都有列时间,卷和数量,但每个表都有其他列(分别说 foo 和 bar )。 我想在列时间,卷和价格上的两个表之间建立对应关系,因为我知道此对应关系存在于从交易到事件的注入(如果交易中的 n 行 t 具有相同的时间 t ,则价格相同 p 和相同的卷 v ,我知道事件中还有 n 行,时间 t < / em>,价格 p 和音量 v )。
交易:
id | time | price | volume | foo
-----+-----------+---------+--------+-------
201 | 32400.524 | 53 | 2085 | xxx
202 | 32400.530 | 53 | 1162 | xxx
203 | 32400.531 | 52.99 | 50 | xxx
204 | 32400.532 | 52.91 | 3119 | xxx
205 | 32400.837 | 52.91 | 3119 | xxx <--
206 | 32400.837 | 52.91 | 3119 | xxx <--
207 | 32400.837 | 52.91 | 3119 | xxx <--
208 | 32400.839 | 52.92 | 3220 | xxx <--
209 | 32400.839 | 52.92 | 3220 | xxx <--
210 | 32400.839 | 52.92 | 3220 | xxx <--
活动:
id | time | price | volume | bar
-----+-----------+---------+--------+------
328 | 32400.835 | 52.91 | 3119 | yyy
329 | 32400.837 | 52.91 | 3119 | yyy <--
330 | 32400.837 | 52.91 | 3119 | yyy <--
331 | 32400.837 | 52.91 | 3119 | yyy <--
332 | 32400.838 | 52.91 | 3119 | yyy
333 | 32400.838 | 52.91 | 3119 | yyy
334 | 32400.839 | 52.92 | 3220 | yyy <--
335 | 32400.839 | 52.92 | 3220 | yyy <--
336 | 32400.839 | 52.92 | 3220 | yyy <--
337 | 32400.840 | 52.91 | 2501 | yyy
我想要的是:
time | price | volume | bar | foo
-----------+---------+--------+------+-------
32400.837 | 52.91 | 3119 | xxx | yyy
32400.837 | 52.91 | 3119 | xxx | yyy
32400.837 | 52.91 | 3119 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
32400.839 | 52.92 | 3220 | xxx | yyy
我不能做一个经典的INNER JOIN,否则我会在两个表之间有所有可能的交叉(在这种情况下,我会有6x6然后36行)。
虽然只有一行而不是一行,但是有几行可以适合。
感谢您的帮助。
编辑:
正如我所说,如果我使用经典的INNER JOIN,例如
SELECT * FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
我会有类似的东西:
trade_id | event_id | time | price | volume | bar | foo
---------+----------+-----------+---------+--------+------+-------
205 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
205 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
205 | 331 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 331 | 32400.837 | 52.91 | 3119 | xxx | yyy
207 | 329 | 32400.839 | 52.91 | 3119 | xxx | yyy
207 | 330 | 32400.839 | 52.91 | 3119 | xxx | yyy
207 | 331 | 32400.839 | 52.91 | 3119 | xxx | yyy
208 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
208 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
208 | 336 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 336 | 32400.837 | 52.92 | 3220 | xxx | yyy
210 | 334 | 32400.839 | 52.92 | 3220 | xxx | yyy
210 | 335 | 32400.839 | 52.92 | 3220 | xxx | yyy
210 | 336 | 32400.839 | 52.92 | 3220 | xxx | yyy
但我想要的是:
trade_id | event_id | time | price | volume | bar | foo
---------+----------+-----------+---------+--------+------+-------
205 | 329 | 32400.837 | 52.91 | 3119 | xxx | yyy
206 | 330 | 32400.837 | 52.91 | 3119 | xxx | yyy
207 | 331 | 32400.839 | 52.91 | 3119 | xxx | yyy
208 | 334 | 32400.837 | 52.92 | 3220 | xxx | yyy
209 | 335 | 32400.837 | 52.92 | 3220 | xxx | yyy
210 | 336 | 32400.839 | 52.92 | 3220 | xxx | yyy
答案 0 :(得分:0)
检查此查询 -
SELECT Events.*,Trades.*
FROM Events
INNER JOIN Trades
ON Trades.time = Events.time
AND Trades.price = Events.price
AND Trades.volume = Events.volume
答案 1 :(得分:0)
试试这个并告诉我是否。我们也可以row_number() over(partion by)
子句,但我不确定它是否适用于postgreSQL。无论如何都试试这个。
SELECT
min(t.id) as trade_id,min(e.id) as event_id,
min(t.time) as time,min(t.price) as price,
min(t.volume) as volume, min(e.bar) as bar,
min(t.foo) as foo
FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
group by t.id
答案 2 :(得分:0)
只需查看您提供的示例数据,一个选项就是:
SELECT e.id, min(t.id), e.time, e.price, e.volume, min(e.bar), min(t.foo) FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
GROUP BY e.id, e.time, e.price, e.volume
答案 3 :(得分:0)
这是我的row_number示例。
此外,SQL小提琴:SO 33608351
with
trades AS
(
select 201 as id, 32400.524 as time, 53 as price, 2085 as volume, 'xxx' as foo union all
select 202, 32400.530, 53, 1162, 'xxx' union all
select 203, 32400.531, 52.99, 50, 'xxx' union all
select 204, 32400.532, 52.91, 3119, 'xxx' union all
select 205, 32400.837, 52.91, 3119, 'xxx' union all
select 206, 32400.837, 52.91, 3119, 'xxx' union all
select 207, 32400.837, 52.91, 3119, 'xxx' union all
select 208, 32400.839, 52.92, 3220, 'xxx' union all
select 209, 32400.839, 52.92, 3220, 'xxx' union all
select 210, 32400.839, 52.92, 3220, 'xxx'
),
events as
(
select 328 as id, 32400.835 as time , 52.91 as price , 3119 as volume , 'yyy' as bar union all
select 329 , 32400.837 , 52.91 , 3119 , 'yyy' union all
select 330 , 32400.837 , 52.91 , 3119 , 'yyy' union all
select 331 , 32400.837 , 52.91 , 3119 , 'yyy' union all
select 332 , 32400.838 , 52.91 , 3119 , 'yyy' union all
select 333 , 32400.838 , 52.91 , 3119 , 'yyy' union all
select 334 , 32400.839 , 52.92 , 3220 , 'yyy' union all
select 335 , 32400.839 , 52.92 , 3220 , 'yyy' union all
select 336 , 32400.839 , 52.92 , 3220 , 'yyy' union all
select 337 , 32400.840 , 52.91 , 2501 , 'yyy'
),
tradesWithRowNumber AS
(
select *
,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
from trades
),
eventsWithRowNumber AS
(
select *
,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
from events
)
select t.time,
t.price,
t.volume,
t.foo,
e.bar
FROM tradesWithRowNumber t
inner JOIN
eventsWithRowNumber e on e.time = t.time
AND e.price = t.price
AND e.volume = t.volume
and e.RowNum = t.RowNum
答案 4 :(得分:0)
如果我理解正确,您只想列出foo
和bar
列而不创建笛卡尔积。为此,您可以使用row_number()
引入新列并加入:
SELECT *
FROM (SELECT e.*,
ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as seqnum
FROM events e
) e INNER JOIN
(SELECT t.*,
ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as FROM trades t
seqnum
) t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume AND
t.seqnum = e.seqnum;
您的问题不清楚是否需要内连接,左外连接或完全外连接。