我有以不同价格出售的产品。我想看看在特定价格范围内售出了多少产品。为此,我需要浏览数据,看看如何划分范围,然后获得该范围内的产品计数。
数据如下所示-
Product Price sold
A 4.5
B 45.7
C 20
D 20.1
E 36.8
F 50
例如,对于以上数据,我看到最小值为4.5,最大值为50。因此,我决定将价格范围划分为-0-10 $,11-20 $,21-30 $,30- 40 $,40-50 $
因此,结果应类似于-
Range No. of products sold
0-10 1
11-20 2
21-30 0
30-40 1
40-50 2
价格为浮动价格,因此,范围应考虑浮动值。这可能吗?
答案 0 :(得分:1)
您可以使用generate_array()
。我可以这样说:
select lb, lb + 10 as ub, count(d.product)
from unnest(generate_array(0, 50, 10)) lb left join
data d
on d.price >= lb and
d.price < lb + 10
group by lb
order by lb;
您可以将下限和上限连接在一起,但是将它们保留在两列中似乎很有用。
答案 1 :(得分:1)
以下是BigQuery标准SQL
#standardSQL
WITH price_ranges AS (
SELECT '0-10' price_range UNION ALL
SELECT '11-20' UNION ALL
SELECT '21-30' UNION ALL
SELECT '30-40' UNION ALL
SELECT '40-50'
)
SELECT price_range, COUNT(1) number_sold
FROM `project.dataset.table`
JOIN price_ranges
ON CAST(price_sold AS INT64)
BETWEEN CAST(SPLIT(price_range, '-')[OFFSET(0)] AS INT64)
AND CAST(SPLIT(price_range, '-')[OFFSET(1)] AS INT64)
GROUP BY price_range
-- ORDER BY price_range
如果要应用于问题的样本数据-结果为
Row price_range number_sold
1 0-10 1
2 11-20 2
3 30-40 1
4 40-50 2
答案 2 :(得分:1)
当前答案似乎都无法回答以下问题:“我如何生成范围”(因为两个答案均假设范围为0-50)。
您似乎想要的是一个直方图,您可以在这里找到答案:
现在,如果您想在每个存储桶之间进行逐步操作:
WITH data AS (
SELECT * FROM `fh-bigquery.public_dump.gdp_capita`
), min_and_max AS (
SELECT MIN(gdp_capita) min, MAX(gdp_capita) max
FROM data
), generate_buckets AS (
SELECT x bucket_min
, IFNULL(LEAD(x) OVER(ORDER BY x), 1+(SELECT max FROM min_and_max)) bucket_max
FROM UNNEST(generate_array(
(SELECT 0 FROM min_and_max) # min or 0, depending on your start
, (SELECT max FROM min_and_max)
, (SELECT POW(10, fhoffa.x.int(LOG10(max-min)))/10 FROM min_and_max) # log10 for round order of 10 steps
)) x
)
SELECT *
FROM generate_buckets
有了这些水桶,您现在就可以得到直方图:
SELECT bucket_min, bucket_max, COUNT(*) c
FROM generate_buckets
JOIN data
ON data.gdp_capita >= bucket_min AND data.gdp_capita < bucket_max
GROUP BY 1,2
ORDER BY 1
如果您还需要具有0个元素的存储桶:
SELECT * REPLACE(IFNULL(c,0) AS c)
FROM (
SELECT bucket_min, bucket_max, COUNT(*) c
FROM generate_buckets
JOIN data
ON data.gdp_capita >= bucket_min AND data.gdp_capita < bucket_max AND data.one=generate_buckets.one
GROUP BY 1,2
ORDER BY 1
)
RIGHT JOIN generate_buckets USING(bucket_min, bucket_max)