我表格中的数据如下所示:
date, app, country, sales
2017-01-01,XYZ,US,10000
2017-01-01,XYZ,GB,2000
2017-01-02,XYZ,US,30000
2017-01-02,XYZ,GB,1000
我需要每天为每个应用程序找到美国销售额与GB销售额的比率,理想情况下结果如下:
date, app, ratio
2017-01-01,XYZ,10000/2000 = 5
2017-01-02,XYZ,30000/1000 = 30
我目前正在将所有内容转储到csv并在Python中离线进行计算,但我想将所有内容都移到SQL端。一种选择是将每个国家聚合成子查询,加入然后划分,例如
select d1_us.date, d1_us.app, d1_us.sales / d1_gb.sales from
(select date, app, sales from table where date between '2017-01-01' and '2017-01-10' and country = 'US') as d1_us
join
(select date, app, sales from table where date between '2017-01-01' and '2017-01-10' and country = 'GB') as d1_gb
on d1_us.app = d1_gb.app and d1_us.date = d1_gb.date
这样做会不那么混乱?
答案 0 :(得分:5)
您可以在查询中使用SUM(CASE WHEN)和GROUP BY的比例来执行此操作,而无需子查询。
SELECT DATE,
APP,
SUM(CASE WHEN COUNTRY = 'US' THEN SALES ELSE 0 END) /
SUM(CASE WHEN COUNTRY = 'GB' THEN SALES END) AS RATIO
FROM TABLE1
GROUP BY DATE, APP;
根据GB销售额为零的可能性,您可以调整GB的ELSE条件,可能是ELSE 1
,以避免Divide by zero错误。这实际上取决于你想如何处理异常。
答案 1 :(得分:0)
您可以使用一个查询进行分组并提供一次条件:
SELECT date, app,
SUM(CASE WHEN country = 'US' THEN SALES ELSE 0 END) /
SUM(CASE WHEN country = 'GB' THEN SALES END) AS ratio
WHERE date between '2017-01-01' AND '2017-01-10'
FROM your_table
GROUP BY date, app;
但是,如果没有美国和NULL
的记录,如果没有GB记录,则会给出零。如果您需要为这些案例返回不同的值,则可以使用围绕该部门的另一个CASE WHEN
。例如,要分别返回-1和-2,可以使用:
SELECT date, app,
CASE WHEN COUNT(CASE WHEN country = 'US' THEN 1 ELSE 0 END) = 0 THEN -1
WHEN COUNT(CASE WHEN country = 'GB' THEN 1 ELSE 0 END) = 0 THEN -2
ELSE SUM(CASE WHEN country = 'US' THEN SALES ELSE 0 END) /
SUM(CASE WHEN country = 'GB' THEN SALES END)
END AS ratio
WHERE date between '2017-01-01' AND '2017-01-10'
FROM your_table
GROUP BY date, app;
答案 2 :(得分:0)
DROP TABLE IF EXISTS t;
CREATE TABLE t (
date DATE,
app VARCHAR(5),
country VARCHAR(5),
sales DECIMAL(10,2)
);
INSERT INTO t VALUES
('2017-01-01','XYZ','US',10000),
('2017-01-01','XYZ','GB',2000),
('2017-01-02','XYZ','US',30000),
('2017-01-02','XYZ','GB',1000);
WITH q AS (
SELECT
date,
app,
country,
SUM(sales) AS sales
FROM t
GROUP BY date, app, country
) SELECT
q1.date,
q1.app,
q1.country || ' vs ' || NVL(q2.country,'-') AS ratio_between,
CASE WHEN q2.sales IS NULL OR q2.sales = 0 THEN 0 ELSE ROUND(q1.sales / q2.sales, 2) END AS ratio
FROM q AS q1
LEFT JOIN q AS q2 ON q2.date = q1.date AND
q2.app = q1.app AND
q2.country != q1.country
-- WHERE q1.country = 'US'
ORDER BY q1.date;
任何国家/地区与任何国家/地区的结果(WHERE q1.country =' US'已被注释掉)
date,app,ratio_between,ratio
2017-01-01,XYZ,GB vs US,0.20
2017-01-01,XYZ,US vs GB,5.00
2017-01-02,XYZ,GB vs US,0.03
2017-01-02,XYZ,US vs GB,30.00
美国与其他任何国家/地区的结果(WHERE q1.country =' US'取消注释)
date,app,ratio_between,ratio
2017-01-01,XYZ,US vs GB,5.00
2017-01-02,XYZ,US vs GB,30.00
诀窍在于JOIN子句。 按日期,应用和国家/地区聚合数据的子查询q的结果将与结果本身相关联,但是在日期和应用中。
这样,对于每个日期,应用和国家/地区,我们都会获得#34;匹配"与同一日期和应用程序的任何其他国家/地区通过添加q1.country!= q2.country,我们排除了同一国家/地区的结果,下面用*
突出显示date,app,country,sales,date,app,country,sales
*2017-01-01,XYZ,GB,2000.00,2017-01-01,XYZ,GB,2000.00*
2017-01-01,XYZ,GB,2000.00,2017-01-01,XYZ,US,10000.00
2017-01-01,XYZ,US,10000.00,2017-01-01,XYZ,GB,2000.00
*2017-01-01,XYZ,US,10000.00,2017-01-01,XYZ,US,10000.00*
2017-01-02,XYZ,GB,1000.00,2017-01-02,XYZ,US,30000.00
*2017-01-02,XYZ,GB,1000.00,2017-01-02,XYZ,GB,1000.00*
*2017-01-02,XYZ,US,30000.00,2017-01-02,XYZ,US,30000.00*
2017-01-02,XYZ,US,30000.00,2017-01-02,XYZ,GB,1000.00