我试图比较BigQuery中的零售商品价格数据(〜2-3B行,具体取决于时间段和零售商);旨在识别有意义的价格差异。例如1.99美元兑2.00美元没有意义,但1.99美元兑2.50美元是有意义的。有意义的量化为价格之间2%的差异。
一个项目的示例数据集如下所示:
ITEM Price($) Meaningful (This is the column I'm trying to flag)
Apple $1.99 Y (lowest price would always be flagged)
Apple $2.00 N ($1.99 v $2.00)
Apple $2.01 N ($1.99 v $2.01) Still using $1.99 for comparison
Apple $2.50 Y ($1.99 v $2.50) Still using $1.99 for comparison
Apple $2.56 Y ($2.50 v $2.56) Now using $2.50 as new comp. price
Apple $2.62 Y ($2.55 v $2.62) Now using $2.56 as new comp. price
我希望只使用SQL窗口函数(超前,滞后,分区等)来解决问题,将当前行的价格与下一行的价格进行比较。但是,当我得到一个没有意义的价格时,这并没有正常工作,因为我总是希望将下一个价值与最近有意义的价格进行比较(参见上面的2.50美元行示例,与2.00美元相比)前一行不是2.01美元)
答案 0 :(得分:1)
以下是BigQuery Standard SQL
#standardSQL
CREATE TEMPORARY FUNCTION x(prices ARRAY<FLOAT64>)
RETURNS ARRAY<STRUCT<price FLOAT64, flag STRING>>
LANGUAGE js AS """
var result = [];
var last = 0;
var flag = '';
for (i = 0; i < prices.length; i++){
if (i == 0) {
last = prices[i];
flag = 'Y'
} else {
if ((prices[i] - last)/last > 0.02) {
last = prices[i];
flag = 'Y'
} else {flag = 'N'}
}
var rec = [];
rec.price = prices[i];
rec.flag = flag;
result.push(rec);
}
return result;
""";
SELECT item, rec.*
FROM (
SELECT item, ARRAY_AGG(price ORDER BY price) AS prices
FROM `yourTable`
GROUP BY item
), UNNEST(x(prices) ) AS rec
-- ORDER BY item, price
您可以使用您问题中的以下虚拟数据进行/测试
#standardSQL
CREATE TEMPORARY FUNCTION x(prices ARRAY<FLOAT64>)
RETURNS ARRAY<STRUCT<price FLOAT64, flag STRING>>
LANGUAGE js AS """
var result = [];
var last = 0;
var flag = '';
for (i = 0; i < prices.length; i++){
if (i == 0) {
last = prices[i];
flag = 'Y'
} else {
if ((prices[i] - last)/last > 0.02) {
last = prices[i];
flag = 'Y'
} else {flag = 'N'}
}
var rec = [];
rec.price = prices[i];
rec.flag = flag;
result.push(rec);
}
return result;
""";
WITH `yourTable` AS (
SELECT 'Apple' AS item, 1.99 AS price UNION ALL
SELECT 'Apple', 2.00 UNION ALL
SELECT 'Apple', 2.01 UNION ALL
SELECT 'Apple', 2.50 UNION ALL
SELECT 'Apple', 2.56 UNION ALL
SELECT 'Apple', 2.62
)
SELECT item, rec.*
FROM (
SELECT item, ARRAY_AGG(price ORDER BY price) AS prices
FROM `yourTable`
GROUP BY item
), UNNEST(x(prices) ) AS rec
ORDER BY item, price
结果如下
item price flag
---- ----- ----
Apple 1.99 Y
Apple 2.0 N
Apple 2.01 N
Apple 2.5 Y
Apple 2.56 Y
Apple 2.62 Y