如何在BigQuery中执行滚动总和

时间:2019-01-11 14:51:21

标签: google-bigquery

我在BigQuery中的示例数据为-

t

我想在此表上进行滚动汇总。例如,我将当前日期设置为import matplotlib matplotlib.use("TkAgg") import matplotlib.pyplot as plt import numpy as np from tkinter import * from matplotlib.backends.backend_tkagg import ( FigureCanvasTkAgg, NavigationToolbar2Tk) # Implement the default Matplotlib key bindings. from matplotlib.backend_bases import key_press_handler # Seperated out config of plot to just do it once def config_plot(): fig, ax = plt.subplots() ax.set(xlabel='time (s)', ylabel='voltage (mV)', title='Graph One') return (fig, ax) class matplotlibSwitchGraphs: def __init__(self, master): self.master = master self.frame = Frame(self.master) self.fig, self.ax = config_plot() self.graphIndex = 0 self.canvas = FigureCanvasTkAgg(self.fig, self.master) self.config_window() self.draw_graph_one() self.frame.pack(expand=YES, fill=BOTH) def config_window(self): self.canvas.mpl_connect("key_press_event", self.on_key_press) toolbar = NavigationToolbar2Tk(self.canvas, self.master) toolbar.update() self.canvas.get_tk_widget().pack(side=TOP, fill=BOTH, expand=1) self.button = Button(self.master, text="Quit", command=self._quit) self.button.pack(side=BOTTOM) self.button_switch = Button(self.master, text="Switch Graphs", command=self.switch_graphs) self.button_switch.pack(side=BOTTOM) def draw_graph_one(self): t = np.arange(0.0, 2.0, 0.01) s = 1 + np.sin(2 * np.pi * t) self.ax.clear() # clear current axes self.ax.plot(t, s) self.ax.set(title='Graph One') self.canvas.draw() def draw_graph_two(self): t = np.arange(0.0, 2.0, 0.01) s = 1 + np.cos(2 * np.pi * t) self.ax.clear() self.ax.plot(t, s) self.ax.set(title='Graph Two') self.canvas.draw() def on_key_press(event): print("you pressed {}".format(event.key)) key_press_handler(event, self.canvas, toolbar) def _quit(self): self.master.quit() # stops mainloop def switch_graphs(self): # Need to call the correct draw, whether we're on graph one or two self.graphIndex = (self.graphIndex + 1 ) % 2 if self.graphIndex == 0: self.draw_graph_one() else: self.draw_graph_two() def main(): root = Tk() matplotlibSwitchGraphs(root) root.mainloop() if __name__ == '__main__': main() 。现在,这是当前日期,我想返回with temp as ( select DATE("2016-10-02") date_field , 200 as salary union all select DATE("2016-10-09"), 500 union all select DATE("2016-10-16"), 350 union all select DATE("2016-10-23"), 400 union all select DATE("2016-10-30"), 190 union all select DATE("2016-11-06"), 550 union all select DATE("2016-11-13"), 610 union all select DATE("2016-11-20"), 480 union all select DATE("2016-11-27"), 660 union all select DATE("2016-12-04"), 690 union all select DATE("2016-12-11"), 810 union all select DATE("2016-12-18"), 950 union all select DATE("2016-12-25"), 1020 union all select DATE("2017-01-01"), 680 ) , temp2 as ( select * , DATE("2017-01-01") as current_date from temp ) select * from temp2 天,取2017-01-01字段的总和。因此,以30为当前日期,应该返回的月份为salary的月份,即2017-01-01,即December。如何使用2016来做到这一点?

2 个答案:

答案 0 :(得分:2)

以下是针对Rolling last 30 days SUM的BigQuery标准SQL

  
#standardSQL
SELECT *,
  SUM(salary) OVER(
    ORDER BY UNIX_DATE(date_field) 
    RANGE BETWEEN 30 PRECEDING AND 1 PRECEDING
  ) AS rolling_30_days_sum
FROM `project.dataset.your_table`

您可以使用以下问题中的示例数据来测试,操作以上内容

#standardSQL
WITH temp AS (
  SELECT DATE("2016-10-02") date_field ,  200 AS salary UNION ALL
  SELECT DATE("2016-10-09"),  500 UNION ALL
  SELECT DATE("2016-10-16"),  350 UNION ALL
  SELECT DATE("2016-10-23"),  400 UNION ALL
  SELECT DATE("2016-10-30"),  190 UNION ALL
  SELECT DATE("2016-11-06"),  550 UNION ALL
  SELECT DATE("2016-11-13"),  610 UNION ALL
  SELECT DATE("2016-11-20"),  480 UNION ALL
  SELECT DATE("2016-11-27"),  660 UNION ALL
  SELECT DATE("2016-12-04"),  690 UNION ALL
  SELECT DATE("2016-12-11"),  810 UNION ALL
  SELECT DATE("2016-12-18"),  950 UNION ALL
  SELECT DATE("2016-12-25"),  1020 UNION ALL
  SELECT DATE("2017-01-01"),  680
) 
SELECT *,
  SUM(salary) OVER(
    ORDER BY UNIX_DATE(date_field) 
    RANGE BETWEEN 30 PRECEDING AND 1 PRECEDING
  ) AS rolling_30_days_sum
FROM temp
-- ORDER BY date_field

有结果

Row date_field  salary  rolling_30_days_sum  
1   2016-10-02  200     null     
2   2016-10-09  500     200  
3   2016-10-16  350     700  
4   2016-10-23  400     1050     
5   2016-10-30  190     1450     
6   2016-11-06  550     1440     
7   2016-11-13  610     1490     
8   2016-11-20  480     1750     
9   2016-11-27  660     1830     
10  2016-12-04  690     2300     
11  2016-12-11  810     2440     
12  2016-12-18  950     2640     
13  2016-12-25  1020    3110     
14  2017-01-01  680     3470     

答案 1 :(得分:0)

这不是确切的“总和”,而是对“我想返回30天并取薪水总和”字段的确切答案。因此,以2017年1月1日为当前日期,应该返回的是12月”。

else if (pid == 0){      // child
        close(downlink[1]);
        dup2(downlink[0],STDIN_FILENO);

请注意,我无法使用with temp as ( select DATE("2016-10-02") date_field , 200 as salary union all select DATE("2016-10-09"), 500 union all select DATE("2016-10-16"), 350 union all select DATE("2016-10-23"), 400 union all select DATE("2016-10-30"), 190 union all select DATE("2016-11-06"), 550 union all select DATE("2016-11-13"), 610 union all select DATE("2016-11-20"), 480 union all select DATE("2016-11-27"), 660 union all select DATE("2016-12-04"), 690 union all select DATE("2016-12-11"), 810 union all select DATE("2016-12-18"), 950 union all select DATE("2016-12-25"), 1020 union all select DATE("2017-01-01"), 680 ) , temp2 as ( select * , DATE("2017-01-01") as current_date_x from temp ) select SUM(salary) from temp2 WHERE date_field BETWEEN DATE_SUB(current_date_x, INTERVAL 30 DAY) AND DATE_SUB(current_date_x, INTERVAL 1 DAY) 3470 作为变量名,因为它已被实际的当前日期替换。