有人能告诉我如何计算bigquery中的移动平均线。
这就是我需要的mysql风格。
SELECT T1.id, T1.value_column1, avg(T2.value_column1)
FROM table1 T1
INNER JOIN table1 T2 ON T2.Id BETWEEN T1.Id-19 AND T1.Id
答案 0 :(得分:4)
你可以做同样的事情,但由于BigQuery只允许连接相等,所以需要更多的工作。这是一个例子,它将根据公共生活样本计算出生体重的6个月移动平均值。
SELECT
--Convert months-since-year-0 back to year, month
INTEGER(month / 12) as year,
month % 12 as month,
avg
FROM (
SELECT month,
-- Note that this average is the average over all of the data in the
-- last 6 months, not an average over the avg values for the last 6 months.
-- It is easy to compute the latter, if that is what is desired -- just
-- compute the average in the inner select, and take the average of those
-- here.
SUM(total_weight_per_month) / SUM(records_per_month) as avg
FROM (
SELECT
-- Note we use t2.month here since that is what is compared against
-- 6 different t1 months.
t2.month as month,
t1.records_per_month as records_per_month,
t1.total_weight_per_month as total_weight_per_month
FROM (
SELECT month,
COUNT( weight_pounds ) as records_per_month,
SUM( weight_pounds) as total_weight_per_month,
-- This active field is the key that lets us join all of the
-- values against the values in the date subselect.
1 AS active
FROM (
SELECT
-- Convert year and month fields to a single value that
-- has the number of months since year 0. This will allow
-- us to do math on the dates.
year * 12 + month AS month,
weight_pounds
FROM [publicdata:samples.natality]
WHERE weight_pounds > 0)
GROUP BY month) as t1
JOIN
-- We join the weights per month agsint a subselect that contains
-- all months.
(SELECT month, 1 as active
FROM
(SELECT
year * 12 + month AS month,
FROM [publicdata:samples.natality])
GROUP BY month) as t2
ON t1.active = t2.active
-- Here is where we get the moving average -- we basically take the month
-- value from t1 and make it apply for 6 months.
WHERE t1.month >= t2.month && t1.month - 6 < t2.month)
GROUP BY month
ORDER BY month desc)
答案 1 :(得分:2)
要获得更新且更有效的答案https://stackoverflow.com/a/24943950/132438。
检查新的LAG()和LEAD()窗口功能。它们允许您遍历结果集,而无需自联接。
https://developers.google.com/bigquery/docs/query-reference#windowfunctions
JOIN EACH的另一个选项(由于可以在中间步骤中生成极大量的数据,因此速度太慢):
SELECT a.SensorId SensorId, a.Timestamp, AVG(b.Data) AS avg_prev_hour_load
FROM (
SELECT * FROM [io_sensor_data.moscone_io13]
WHERE SensorId = 'XBee_40670EB0/mic') a
JOIN EACH [io_sensor_data.moscone_io13] b
ON a.SensorId = b.SensorId
WHERE b.Timestamp BETWEEN (a.Timestamp - 3600000) AND a.Timestamp
GROUP BY SensorId, a.Timestamp;
(基于Joe Celko的SQL问题)
答案 2 :(得分:0)
我创建了以下“时代”表:
Table Details: Dim_Periods
Schema
Date TIMESTAMP
Year INTEGER
Month INTEGER
day INTEGER
QUARTER INTEGER
DAYOFWEEK INTEGER
MonthStart TIMESTAMP
MonthEnd TIMESTAMP
WeekStart TIMESTAMP
WeekEnd TIMESTAMP
Back30Days TIMESTAMP -- the date 30 days before "Date"
Back7Days TIMESTAMP -- the date 7 days before "Date"
我使用这样的查询来处理“运行总和”
SELECT Date,Count(*) as MovingCNT
FROM
(SELECT Date,
Back7Days
FROM DWH.Dim_Periods
where Date < timestamp(current_date()) AND
Date >= (DATE_ADD (CURRENT_TIMESTAMP(), -5, 'month'))
)P
CROSS JOIN EACH
(SELECT repository_url,repository_created_at
FROM publicdata:samples.github_timeline
) L
WHERE timestamp(repository_created_at)>= Back7Days
AND timestamp(repository_created_at)<= Date
GROUP EACH BY Date
请注意,它可以用于“月到日”,周至日期“”30天后“等聚合。 但是,性能不是最好的,并且由于笛卡尔连接,查询在较大的数据集上可能需要一段时间。 希望这有帮助