SQL逐个选择内部联接

时间:2015-11-09 10:48:39

标签: sql inner-join

我有一个特定的请求要在我的数据库上做(PostgreSQL v9.4.5),我没有在纯SQL中看到任何优雅的解决方案来解决它(我知道我可以用Python或其他方法来做,但是我有几十亿行数据,计算时间会大大增加。)

我有两个表:交易事件。这些表格都代表一天中订单簿中发生的交易(这就是为什么我有几十亿行,我的数据已经过了几年)但是事件交易更多 EM>

两个表都有列时间数量,但每个表都有其他列(分别说 foo 和 bar )。 我想在列时间价格上的两个表之间建立对应关系,因为我知道此对应关系存在于从交易到事件的注入(如果交易中的 n t 具有相同的时间 t ,则价格相同 p 和相同的卷 v ,我知道事件中还有 n 行,时间 t < / em>,价格 p 和音量 v )。

交易:

  id |   time    |  price  | volume |   foo
-----+-----------+---------+--------+-------
 201 | 32400.524 |      53 |   2085 |   xxx
 202 | 32400.530 |      53 |   1162 |   xxx
 203 | 32400.531 |   52.99 |     50 |   xxx
 204 | 32400.532 |   52.91 |   3119 |   xxx
 205 | 32400.837 |   52.91 |   3119 |   xxx <--
 206 | 32400.837 |   52.91 |   3119 |   xxx <--
 207 | 32400.837 |   52.91 |   3119 |   xxx <--
 208 | 32400.839 |   52.92 |   3220 |   xxx <--
 209 | 32400.839 |   52.92 |   3220 |   xxx <--
 210 | 32400.839 |   52.92 |   3220 |   xxx <--

活动:

  id |   time    |  price  | volume |  bar 
-----+-----------+---------+--------+------
 328 | 32400.835 |   52.91 |   3119 |  yyy
 329 | 32400.837 |   52.91 |   3119 |  yyy <--
 330 | 32400.837 |   52.91 |   3119 |  yyy <--
 331 | 32400.837 |   52.91 |   3119 |  yyy <--
 332 | 32400.838 |   52.91 |   3119 |  yyy
 333 | 32400.838 |   52.91 |   3119 |  yyy
 334 | 32400.839 |   52.92 |   3220 |  yyy <--
 335 | 32400.839 |   52.92 |   3220 |  yyy <--
 336 | 32400.839 |   52.92 |   3220 |  yyy <--
 337 | 32400.840 |   52.91 |   2501 |  yyy

我想要的是:

   time    |  price  | volume |  bar |   foo 
-----------+---------+--------+------+-------
 32400.837 |   52.91 |   3119 |  xxx |   yyy
 32400.837 |   52.91 |   3119 |  xxx |   yyy
 32400.837 |   52.91 |   3119 |  xxx |   yyy
 32400.839 |   52.92 |   3220 |  xxx |   yyy
 32400.839 |   52.92 |   3220 |  xxx |   yyy
 32400.839 |   52.92 |   3220 |  xxx |   yyy

我不能做一个经典的INNER JOIN,否则我会在两个表之间有所有可能的交叉(在这种情况下,我会有6x6然后36行)。

虽然只有一行而不是一行,但是有几行可以适合。

感谢您的帮助。

编辑:

正如我所说,如果我使用经典的INNER JOIN,例如

SELECT * FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume

我会有类似的东西:

trade_id | event_id |   time    |  price  | volume |  bar |   foo 
---------+----------+-----------+---------+--------+------+-------
  205    |   329    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  205    |   330    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  205    |   331    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  206    |   329    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  206    |   330    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  206    |   331    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  207    |   329    | 32400.839 |   52.91 |   3119 |  xxx |   yyy
  207    |   330    | 32400.839 |   52.91 |   3119 |  xxx |   yyy
  207    |   331    | 32400.839 |   52.91 |   3119 |  xxx |   yyy
  208    |   334    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  208    |   335    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  208    |   336    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  209    |   334    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  209    |   335    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  209    |   336    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  210    |   334    | 32400.839 |   52.92 |   3220 |  xxx |   yyy
  210    |   335    | 32400.839 |   52.92 |   3220 |  xxx |   yyy
  210    |   336    | 32400.839 |   52.92 |   3220 |  xxx |   yyy

但我想要的是:

trade_id | event_id |   time    |  price  | volume |  bar |   foo 
---------+----------+-----------+---------+--------+------+-------
  205    |   329    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  206    |   330    | 32400.837 |   52.91 |   3119 |  xxx |   yyy
  207    |   331    | 32400.839 |   52.91 |   3119 |  xxx |   yyy
  208    |   334    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  209    |   335    | 32400.837 |   52.92 |   3220 |  xxx |   yyy
  210    |   336    | 32400.839 |   52.92 |   3220 |  xxx |   yyy

5 个答案:

答案 0 :(得分:0)

检查此查询 -

SELECT Events.*,Trades.*
FROM Events
INNER JOIN Trades
ON Trades.time = Events.time
AND Trades.price = Events.price
AND Trades.volume = Events.volume

答案 1 :(得分:0)

试试这个并告诉我是否。我们也可以row_number() over(partion by)子句,但我不确定它是否适用于postgreSQL。无论如何都试试这个。

SELECT 
  min(t.id) as trade_id,min(e.id) as event_id,
  min(t.time) as time,min(t.price) as price,
  min(t.volume) as volume,  min(e.bar) as bar,
  min(t.foo) as foo 
FROM events e,
  INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
group by t.id

答案 2 :(得分:0)

只需查看您提供的示例数据,一个选项就是:

SELECT e.id, min(t.id), e.time, e.price, e.volume, min(e.bar), min(t.foo)  FROM events e,
INNER JOIN trades t
ON t.time = e.time AND t.price = e.price AND t.volume = e.volume
GROUP BY e.id, e.time, e.price, e.volume

答案 3 :(得分:0)

这是我的row_number示例。

此外,SQL小提琴:SO 33608351

with 
trades AS
(
    select 201 as id, 32400.524 as time, 53 as price,       2085 as volume, 'xxx' as foo union all
    select 202, 32400.530, 53,      1162,   'xxx' union all
    select 203, 32400.531, 52.99,       50,     'xxx' union all
    select 204, 32400.532, 52.91,       3119,   'xxx' union all
    select 205, 32400.837, 52.91,       3119,   'xxx' union all
    select 206, 32400.837, 52.91,       3119,   'xxx' union all
    select 207, 32400.837, 52.91,       3119,   'xxx' union all
    select 208, 32400.839, 52.92,       3220,   'xxx' union all
    select 209, 32400.839, 52.92,       3220,   'xxx' union all
    select 210, 32400.839, 52.92,       3220,   'xxx'
),
events as
(
    select 328 as id, 32400.835 as time ,   52.91 as price ,   3119 as volume ,  'yyy' as bar union all
    select 329 , 32400.837 ,   52.91 ,   3119 ,  'yyy' union all
    select 330 , 32400.837 ,   52.91 ,   3119 ,  'yyy' union all
    select 331 , 32400.837 ,   52.91 ,   3119 ,  'yyy' union all
    select 332 , 32400.838 ,   52.91 ,   3119 ,  'yyy' union all
    select 333 , 32400.838 ,   52.91 ,   3119 ,  'yyy' union all
    select 334 , 32400.839 ,   52.92 ,   3220 ,  'yyy' union all
    select 335 , 32400.839 ,   52.92 ,   3220 ,  'yyy' union all
    select 336 , 32400.839 ,   52.92 ,   3220 ,  'yyy' union all
    select 337 , 32400.840 ,   52.91 ,   2501 ,  'yyy'
),
tradesWithRowNumber AS
(
    select   *
            ,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
    from trades
),
eventsWithRowNumber AS
(
    select   *
            ,ROW_NUMBER() over (PARTITION by time, price, volume order by time, price, volume) as RowNum
    from events
)
select  t.time,
        t.price,
        t.volume,
        t.foo,
        e.bar
FROM    tradesWithRowNumber t
        inner JOIN
        eventsWithRowNumber e   on  e.time = t.time
                                AND e.price = t.price
                                AND e.volume = t.volume
                                and e.RowNum = t.RowNum

答案 4 :(得分:0)

如果我理解正确,您只想列出foobar列而不创建笛卡尔积。为此,您可以使用row_number()引入新列并加入:

SELECT *
FROM (SELECT e.*,
             ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as seqnum
      FROM events e
     ) e INNER JOIN
     (SELECT t.*,
             ROW_NUMBER() OVER (PARTITION BY time, price, volume ORDER BY id) as       FROM trades t
seqnum
     ) t
     ON t.time = e.time AND t.price = e.price AND t.volume = e.volume AND
        t.seqnum = e.seqnum;

您的问题不清楚是否需要内连接,左外连接或完全外连接。