如何返回符合连续3次出现标准的ID的行?

时间:2017-11-17 14:12:03

标签: sql postgresql

我有一个要求让我循环。我必须返回完全连续3行月发票金额> $ 2,000的位置ID。换句话说,我不想返回成熟地点的ID(可能有数百个月度发票行)。

Rextester示例数据:http://rextester.com/CNJC15871

的信息:

  • 报告将在每个月的第一天运行
  • 每月发票日期是该月的第15天

所需的输出:在下面的发票表中,

  • 对于报告运行日期10/1,将返回loc_ids 2223和3344,因为9/15是发票金额连续第三个月> $ 2,000
  • 对于报告运行日期为11/1,将使用相同的逻辑返回loc_id 6678.
  • 对于后续报告月份,2223,3344和6678应返回 NOT ,因为它们连续3个月> $ 2,000。

    | loc_id | invoice_date | invoice_amt | Notes                     |
    |--------|--------------|-------------|---------------------------|
    | 1234   | 5/15/2002    | 7000        |                           |
    | 1234   | 6/15/2002    | 8000        |                           |
    | ..     | …            | …           |                           |
    | 1234   | 11/15/2017   | 58000       |                           |
    |        |              |             |                           |
    | 9987   | 11/15/2006   | 7500        |                           |
    | 9987   | 12/15/2006   | 8500        |                           |
    | …      | …            |             |                           |
    | 9987   | 11/15/2017   | 63000       |                           |
    |        |              |             |                           |
    | 5544   | 3/15/2015    | 9200        |                           |
    | 5544   | 4/15/2015    | 10000       |                           |
    | …      | …            |             |                           |
    | 5544   | 11/15/2017   | 70000       |                           |
    |        |              |             |                           |
    | 2223   | 5/15/2017    | 2500        | Count| >2000              |
    | 2223   | 6/15/2017    | 1375        | Do not count| <2000       |
    | 2223   | 7/15/2017    | 8000        | Restart count| >2000 (1)  |
    | 2223   | 8/15/2017    | 9000        | Continue count| >2000 (2) |
    | 2223   | 9/15/2017    | 9800        | Continue count| >2000 (3) |
    | 2223   | 10/15/2017   | 10500       | Stop count| >3 in a row   |
    | 2223   | 11/15/2017   | 11200       | Stop count| >3 in a row   |
    |        |              |             |                           |
    | 3344   | 7/15/2017    | 3500        | Count| >2000 (1)          |
    | 3344   | 8/15/2017    | 4500        | Continue count| >2000 (2) |
    | 3344   | 9/15/2017    | 6000        | Continue count| >2000 (3) |
    | 3344   | 10/15/2017   | 7000        | Stop count| >3 in a row   |
    | 3344   | 11/15/2017   | 8000        | Stop count| >3 in a row   |
    |        |              |             |                           |
    | 6678   | 8/15/2017    | 3000        | Count| >2000 (1)          |
    | 6678   | 9/15/2017    | 4000        | Continue count| >2000 (2) |
    | 6678   | 10/15/2017   | 5000        | Continue count| >2000 (3) |
    

我还有一个包含位置开放日期的loc_id维度。

| loc_id | loc_open_dt |
|--------|-------------|
| 1234   | 2002-05-01  |
| 9987   | 2006-10-22  |
| 5544   | 2015-03-04  |
| 2223   | 2017-05-05  |
| 3344   | 2017-07-05  |
| 6678   | 2017-08-01  | 

2 个答案:

答案 0 :(得分:1)

在PostgreSQL中,您可以使用窗口功能。您需要通过loc_id聚合数据来构建窗口,然后检查3个连续行的invoice_amt是否大于目标值。这个技巧是通过使用lag()函数完成的,该函数在窗口上应用,可以从之前的行中获取数据。代码比解释简单得多:

SELECT DISTINCT loc_id FROM (
  SELECT *, 
         invoice_amt > 10000 AS a, 
         lag(invoice_amt, 1) OVER w > 10000 AS b,
         lag(invoice_amt, 2) OVER w > 10000 AS c,
         extract('month' from invoice_date::date) AS m1, 
         extract('month' from (lag(invoice_date, 1) OVER w)::date + '1 month'::interval) AS m2, 
         extract('month' from (lag(invoice_date, 2) OVER w)::date + '2 month'::interval) AS m3
    FROM invoices 
  WINDOW w AS (PARTITION BY loc_id ORDER BY invoice_date)
) X 
 WHERE a AND b AND c AND m2 = m1 AND m3 = m1

另请注意检查连续个月。我们只需在lag() ged日期添加1或2个月,然后检查连续三行的月份是否相同(如评论中所述)。

如果您想更好地了解其工作原理,只需运行内部SELECT并查看结果。

答案 1 :(得分:1)

以下是检查:

的查询
  • &gt; = 2000金额的月份是连续的,
  • 上市日期前一个月是最后一个,
  • 如果这三个月之前有一个月的金额,则低于2000

查询:

select distinct loc_id
from   (
        select loc_id, 
               first_value(invoice_amt) over win                            first_amt,
               floor((list_date - first_value(invoice_date) over win)/30)+1 month_count,
               list_date - last_value(invoice_date) over win < 30           has_last_month,
               count(case when invoice_amt >= 2000 then 1 end) over win     large_amt_count
        from   invoices,
               (select date '2017-10-01' /* current_date */ list_date) ref 
        where  invoice_date between (list_date - 120) and list_date
        window win as (partition by loc_id order by invoice_date)
       ) base
where  month_count = 3 + (first_amt < 2000)::int
   and large_amt_count = 3
   and has_last_month;

rextester

上查看它

将查询中间的文字日期更改为实际报告日期(或current_date)。