我目前正在做与此类似的事情: MySQL - Subtracting value from previous row, group by
我目前的查询是:
SELECT a.x, a.y, a.z, COALESCE(a.z - b.z,0) AS diff
FROM [bla] AS a
LEFT JOIN EACH
[bla] AS b
ON b.x=a.x
AND b.y = (SELECT MAX(y) FROM [bla] WHERE x = a.x AND y < a.y)
但是,我最终得到以下错误:
Error: An internal error occurred and the request could not be completed.
这个错误并没有多大帮助,我不知道这里有什么问题。问题似乎是使用SELECT子查询的最终ON语句。
答案 0 :(得分:3)
使用上述链接中的数据 - MySQL - Subtracting value from previous row, group by: BigQuery的解决方案就像下面的语句一样简单
SELECT SN, Date, COALESCE(ROUND(Value - NextValue, 2), 0) as consumption
FROM (
SELECT *, LAG(Value, 1) OVER (PARTITION BY SN ORDER BY Date) as NextValue
FROM temp.EnergyLog)
ORDER BY SN, Date
现在,下面是尝试用你的[bla]表写的:
SELECT x, y, z, COALESCE(ROUND(z - Nextz, 2), 0) as diff
FROM (
SELECT *, LAG(z, 1) OVER (PARTITION BY x ORDER BY y) as Nextz
FROM temp.bla)
ORDER BY x, y
我认为上面有很好的工作机会 - 但你可能需要做一些额外的调整
答案 1 :(得分:1)
另一种解决方案基于最近推出的JS UDF 它看起来比我上面提到的更重,但我也喜欢它,因为它可以很好地控制分析逻辑。
我怀疑这将是你的实际选择,但从概念上讲这可能是有用的
因此,例如从MySQL - Subtracting value from previous row, group by解决方案
SELECT SN, Date, ROUND(consumption,2) as consumption FROM
js( // input table
(SELECT SN, NEST(STRING(Date) + ',' + STRING(Value)) as Metric
FROM temp.EnergyLog GROUP BY SN) ,
// input columns
SN, Metric,
// output schema
"[{name: 'SN', type: 'integer'},
{name: 'Date', type: 'string'},
{name: 'consumption', type: 'float'}]",
// function
"function(r, emit){
pair = r.Metric.sort(function (a,b) {return a > b;});
val = pair[0].split(','); Date = val[0];
emit({SN: r.SN, Date: Date, consumption: 0});
for (var i=0; i<pair.length -1; i +=1){
val = pair[i].split(','); Date = val[0]; Value1 = val[1];
val = pair[i+1].split(','); Value2 = val[1];
emit({SN: r.SN, Date: Date, consumption: Value2 - Value1});
}
}"
) ORDER BY SN, Date
您可以在此处查看UDF文档:https://cloud.google.com/bigquery/user-defined-functions
输出与使用LAG的预先建议的解决方案完全相同
希望你能够&#34;翻译&#34;使用[bla] table
将代码改为你的情况答案 2 :(得分:0)
我不知道您的内部错误的具体原因,但请注意,BigQuery中的连接条件必须是相等的连接(例如,a.x = b.x AND a.y = b.y
)。您不能在连接条件中放置常量,不等式或子查询。
另外,我不鼓励在BigQuery中使用自联接,因为它们通常会导致性能问题。您似乎正在尝试为任何给定的x找到类似最大y的东西?如果是这样,您可以改为使用分析函数(例如,MAX(y) OVER(PARTITION BY x)
)?