每日大量汇总

时间:2017-10-17 01:34:38

标签: sql google-bigquery data-warehouse

我在big-query(datawarehouse)中有一个表:

enter image description here

我希望得到以下结果:

enter image description here

以下是计算方法的解释:

  1. 2017-10-01 = 100美元是显而易见的,因为数据只有一个
  2. 2017-10-02 = $ 400是第一行和第三行的总和。为什么?因为第二行和第三行具有相同的发票。所以我们只使用最新的更新。
  3. 2017-10-04 = 800美元是第1,3行和第4行的总和。为什么?这是因为我们每天只收一张发票。第1行(T001),第3行(T002),第4行(T003)
  4. 2017-10-05 = 100美元是第1,5行和第6行的总和。为什么?这是因为我们每天只收一张发票。第1行(T001),第5行(T002),第6行(T003)
  5. 老实说,我完全失去了怎么做。我已多次尝试分组等等。但它们都没有按预期工作。这是我今天迄今为止的最新努力:

    SELECT 
      amount,
      updatedDateOnly,
      invNo
    FROM 
    (
      SELECT 
        invNo,
        UpdatedDate,
        amount,
        DATE(updatedDate) as updatedDateOnly,
        row_number() OVER (PARTITION BY  invNo ORDER BY UpdatedDate DESC) AS rownum
      FROM [project:dataset.test] 
    )
    WHERE
      rownum = 1
    

    仅返回上次日期。现在,我不知道如何查询每日。

    感谢任何专家并愿意帮助查询的人。谢谢。

    更新: json中的数据,以防您想在bigquery或其他SQL服务器中尝试:

    {"UpdatedDate":"2017-10-01 01:00:00","InvNo":"T001","amount":100}
    {"UpdatedDate":"2017-10-02 01:00:00","InvNo":"T002","amount":200}
    {"UpdatedDate":"2017-10-02 02:00:00","InvNo":"T002","amount":300}
    {"UpdatedDate":"2017-10-04 01:00:00","InvNo":"T003","amount":400}
    {"UpdatedDate":"2017-10-05 01:00:00","InvNo":"T002","amount":500}
    {"UpdatedDate":"2017-10-05 02:00:00","InvNo":"T003","amount":500}
    

2 个答案:

答案 0 :(得分:4)

以下是BigQuery Standard SQL

   
#standardSQL
WITH dates AS (
  SELECT DISTINCT DATE(UpdatedDate) UpdatedDay
  FROM `project.dataset.test`
),
qualified AS (
  SELECT DATE(UpdatedDate) UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY UpdatedDate DESC LIMIT 1)[SAFE_OFFSET(0)] amount
  FROM `project.dataset.test`
  GROUP BY UpdatedDay, InvNo
)
SELECT UpdatedDay, SUM(amount) amount
FROM (
  SELECT d.UpdatedDay UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY q.UpdatedDay DESC LIMIT 1)[SAFE_OFFSET(0)] amount
  FROM dates d
  JOIN qualified q
  ON q.UpdatedDay <= d.UpdatedDay
  GROUP BY UpdatedDay, InvNo
)
GROUP BY UpdatedDay
-- ORDER BY UpdatedDay

您可以使用以下来自问题的虚拟数据进行测试/播放

#standardSQL
WITH `project.dataset.test` AS (
  SELECT TIMESTAMP '2017-10-01 01:00:00' UpdatedDate, 'T001' InvNo, 100 amount UNION ALL
  SELECT TIMESTAMP '2017-10-02 01:00:00', 'T002', 200 UNION ALL
  SELECT TIMESTAMP '2017-10-02 02:00:00', 'T002', 300 UNION ALL
  SELECT TIMESTAMP '2017-10-04 01:00:00', 'T003', 400 UNION ALL
  SELECT TIMESTAMP '2017-10-05 01:00:00', 'T002', 500 UNION ALL
  SELECT TIMESTAMP '2017-10-05 02:00:00', 'T003', 500 
),
dates AS (
  SELECT DISTINCT DATE(UpdatedDate) UpdatedDay
  FROM `project.dataset.test`
),
qualified AS (
  SELECT DATE(UpdatedDate) UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY UpdatedDate DESC LIMIT 1)[SAFE_OFFSET(0)] amount
  FROM `project.dataset.test`
  GROUP BY UpdatedDay, InvNo
)
SELECT UpdatedDay, SUM(amount) amount
FROM (
  SELECT d.UpdatedDay UpdatedDay, InvNo, ARRAY_AGG(amount ORDER BY q.UpdatedDay DESC LIMIT 1)[SAFE_OFFSET(0)] amount
  FROM dates d
  JOIN qualified q
  ON q.UpdatedDay <= d.UpdatedDay
  GROUP BY UpdatedDay, InvNo
)
GROUP BY UpdatedDay
ORDER BY UpdatedDay

结果符合预期

UpdatedDay  amount   
2017-10-01   100     
2017-10-02   400     
2017-10-04   800     
2017-10-05  1100     

答案 1 :(得分:1)

在每个日期,您需要每张发票的最新金额。这很复杂。一种解决方案是从日期和记录的交叉连接开始。然后窗口函数可用于获取最新的金额:

select dte,
       sum(case when seqnum = 1 then amount else 0 end) as amount
from (select d.dte, t.*,
             row_number() over (partition by t.invNo order by t.UpdatedDate desc) as seqnum
      from (select distinct date(UpdatedDate) as dte
            from `project.dataset.test` t
           ) d join
           `project.dataset.test` t
           on date(t.UpdatedDate) <= d.dte
     ) td
group by dte;