在PostgreSQL中对连续日期进行分组

时间:2015-09-16 08:27:32

标签: postgresql date

我有两个表需要合并,因为有时某些日期在表A中找到而在表B中没有,反之亦然。我想要的结果是,连续几天的重叠将被合并。

我正在使用PostgreSQL。

表A

id  startdate   enddate
--------------------------
101 12/28/2013  12/31/2013

表B

id  startdate   enddate
--------------------------
101 12/15/2013  12/15/2013
101 12/16/2013  12/16/2013
101 12/28/2013  12/28/2013
101 12/29/2013  12/31/2013

期望的结果

id  startdate   enddate
-------------------------
101 12/15/2013  12/16/2013
101 12/28/2013  12/31/2013

2 个答案:

答案 0 :(得分:2)

右。我有一个我觉得有效的查询。它肯定适用于您提供的样本记录。它使用递归CTE。

首先,您需要合并两个表。接下来,使用递归CTE来获取重叠日期的序列。最后,获取开始日期和结束日期,然后加入“合并”表以获取ID。

folder = Dir.mktmpdir
      begin
      puts folder
      File.open("#{folder}/input.xml", 'w') { |file| file.write(builder.to_xml) }

      ensure
      system '.././program'
      FileUtils.remove_entry folder
      end

答案 1 :(得分:2)

以下片段符合您的意图。 (但它可能会非常慢)问题是使用标准range operators检测(非)重叠日期范围是不可能的,因为范围可以分为两部分。 所以,我的代码执行以下操作:

  • 将日期范围从table_A拆分为 atomic 记录,每条记录有一个日期
  • [table_b相同]
  • 交叉连接这两个表(我们只对A_not_in_B和B_not_in_A感兴趣),记住它来自哪个L / R外连接翼。
  • 将结果记录重新汇总到日期范围内。
-- EXPLAIN ANALYZE
-- 
WITH  RECURSIVE ranges AS (
            -- Chop up the a-table into atomic date units
    WITH ar AS (
            SELECT generate_series(a.startdate,a.enddate , '1day'::interval)::date  AS thedate
            ,  'A'::text AS which
            , a.id
            FROM a
            )
            -- Same for the b-table
    , br AS (
            SELECT generate_series(b.startdate,b.enddate, '1day'::interval)::date  AS thedate
            ,  'B'::text AS which
            , b.id
            FROM b
            )
            -- combine the two sets, retaining a_not_in_b plus b_not_in_a
    , moments AS (
            SELECT COALESCE(ar.id,br.id) AS id
            , COALESCE(ar.which, br.which) AS which
            , COALESCE(ar.thedate, br.thedate) AS thedate
            FROM ar
            FULL JOIN br ON br.id = ar.id AND br.thedate =  ar.thedate
            WHERE ar.id IS NULL OR br.id IS NULL
            )
            -- use a recursive CTE to re-aggregate the atomic moments into ranges
    SELECT m0.id, m0.which
            , m0.thedate AS startdate
            , m0.thedate AS enddate
    FROM moments m0
    WHERE NOT EXISTS ( SELECT * FROM moments nx WHERE nx.id = m0.id  AND nx.which = m0.which
            AND nx.thedate = m0.thedate -1
            )
    UNION ALL
    SELECT rr.id, rr.which
            , rr.startdate AS startdate
            , m1.thedate AS enddate
    FROM ranges rr
    JOIN moments m1 ON m1.id = rr.id AND m1.which = rr.which AND m1.thedate = rr.enddate +1
    )
SELECT * FROM ranges ra
WHERE NOT EXISTS (SELECT * FROM ranges nx
    -- suppress partial subassemblies
    WHERE nx.id = ra.id AND nx.which = ra.which
    AND nx.startdate = ra.startdate
    AND nx.enddate > ra.enddate
    )
 ;