行中数据之间的划分 - SQL

时间:2017-12-27 19:20:52

标签: sql amazon-redshift

我表格中的数据如下所示:

date, app, country, sales
2017-01-01,XYZ,US,10000
2017-01-01,XYZ,GB,2000
2017-01-02,XYZ,US,30000
2017-01-02,XYZ,GB,1000

我需要每天为每个应用程序找到美国销售额与GB销售额的比率,理想情况下结果如下:

date, app, ratio
2017-01-01,XYZ,10000/2000 = 5
2017-01-02,XYZ,30000/1000 = 30

我目前正在将所有内容转储到csv并在Python中离线进行计算,但我想将所有内容都移到SQL端。一种选择是将每个国家聚合成子查询,加入然后划分,例如

select d1_us.date, d1_us.app, d1_us.sales / d1_gb.sales from
(select date, app, sales from table where date between '2017-01-01' and '2017-01-10' and country = 'US') as d1_us
join 
(select date, app, sales from table where date between '2017-01-01' and '2017-01-10' and country = 'GB') as d1_gb
on d1_us.app = d1_gb.app and d1_us.date = d1_gb.date

这样做会不那么混乱?

3 个答案:

答案 0 :(得分:5)

您可以在查询中使用SUM(CASE WHEN)和GROUP BY的比例来执行此操作,而无需子查询。

SELECT DATE, 
       APP,
       SUM(CASE WHEN COUNTRY = 'US' THEN SALES ELSE 0 END) /
       SUM(CASE WHEN COUNTRY = 'GB' THEN SALES END) AS RATIO    
FROM TABLE1
GROUP BY DATE, APP;

根据GB销售额为零的可能性,您可以调整GB的ELSE条件,可能是ELSE 1,以避免Divide by zero错误。这实际上取决于你想如何处理异常。

答案 1 :(得分:0)

您可以使用一个查询进行分组并提供一次条件:

SELECT date, app,
       SUM(CASE WHEN country = 'US' THEN SALES ELSE 0 END) /
       SUM(CASE WHEN country = 'GB' THEN SALES END) AS ratio
WHERE date between '2017-01-01' AND '2017-01-10'
FROM your_table
GROUP BY date, app;

但是,如果没有美国和NULL的记录,如果没有GB记录,则会给出零。如果您需要为这些案例返回不同的值,则可以使用围绕该部门的另一个CASE WHEN。例如,要分别返回-1和-2,可以使用:

SELECT date, app,
       CASE WHEN COUNT(CASE WHEN country = 'US' THEN 1 ELSE 0 END) = 0 THEN -1
            WHEN COUNT(CASE WHEN country = 'GB' THEN 1 ELSE 0 END) = 0 THEN -2
            ELSE SUM(CASE WHEN country = 'US' THEN SALES ELSE 0 END) /
                 SUM(CASE WHEN country = 'GB' THEN SALES END)
            END AS ratio
WHERE date between '2017-01-01' AND '2017-01-10'
FROM your_table
GROUP BY date, app;

答案 2 :(得分:0)

DROP TABLE IF EXISTS t;
CREATE TABLE t (
  date DATE,
  app VARCHAR(5),
  country VARCHAR(5),
  sales DECIMAL(10,2)
);

INSERT INTO t VALUES
  ('2017-01-01','XYZ','US',10000),
  ('2017-01-01','XYZ','GB',2000),
  ('2017-01-02','XYZ','US',30000),
  ('2017-01-02','XYZ','GB',1000);


WITH q AS (
    SELECT
      date,
      app,
      country,
      SUM(sales) AS sales
    FROM t
    GROUP BY date, app, country
) SELECT
    q1.date,
    q1.app,
    q1.country || ' vs ' || NVL(q2.country,'-') AS ratio_between,
    CASE WHEN q2.sales IS NULL OR q2.sales = 0 THEN 0 ELSE ROUND(q1.sales / q2.sales, 2) END AS ratio
  FROM q AS q1
    LEFT JOIN q AS q2 ON q2.date = q1.date AND
                    q2.app = q1.app AND
                    q2.country != q1.country
  -- WHERE q1.country = 'US'
  ORDER BY q1.date;

任何国家/地区与任何国家/地区的结果(WHERE q1.country =' US'已被注释掉)

date,app,ratio_between,ratio
2017-01-01,XYZ,GB vs US,0.20
2017-01-01,XYZ,US vs GB,5.00
2017-01-02,XYZ,GB vs US,0.03
2017-01-02,XYZ,US vs GB,30.00

美国与其他任何国家/地区的结果(WHERE q1.country =' US'取消注释)

date,app,ratio_between,ratio
2017-01-01,XYZ,US vs GB,5.00
2017-01-02,XYZ,US vs GB,30.00

诀窍在于JOIN子句。 按日期,应用和国家/地区聚合数据的子查询q的结果将与结果本身相关联,但是在日期和应用中。

这样,对于每个日期,应用和国家/地区,我们都会获得#34;匹配"与同一日期和应用程序的任何其他国家/地区通过添加q1.country!= q2.country,我们排除了同一国家/地区的结果,下面用*

突出显示
date,app,country,sales,date,app,country,sales
*2017-01-01,XYZ,GB,2000.00,2017-01-01,XYZ,GB,2000.00*
2017-01-01,XYZ,GB,2000.00,2017-01-01,XYZ,US,10000.00
2017-01-01,XYZ,US,10000.00,2017-01-01,XYZ,GB,2000.00
*2017-01-01,XYZ,US,10000.00,2017-01-01,XYZ,US,10000.00*
2017-01-02,XYZ,GB,1000.00,2017-01-02,XYZ,US,30000.00
*2017-01-02,XYZ,GB,1000.00,2017-01-02,XYZ,GB,1000.00*
*2017-01-02,XYZ,US,30000.00,2017-01-02,XYZ,US,30000.00*
2017-01-02,XYZ,US,30000.00,2017-01-02,XYZ,GB,1000.00