将BigQuery表的并发导出扩展到Google Cloud Storage

时间:2018-03-01 19:23:47

标签: google-bigquery google-cloud-storage

我正在尝试在BigQuery中运行查询并将结果存储在云存储中。使用BigQueries API非常简单。

当我尝试同时执行多个查询时会出现问题。将结果表“提取”到云存储会显着减慢我尝试提取的表。这是我为20个并发工作做的实验的总结结果。结果以秒为单位。

job 013 done. Query: 012.0930221081. Extract: 009.8582818508. Signed URL: 000.3398022652
job 000 done. Query: 012.1677722931. Extract: 010.7060177326. Signed URL: 000.3358650208
job 002 done. Query: 009.5634860992. Extract: 014.2841088772. Signed URL: 000.3027939796
job 004 done. Query: 011.7068181038. Extract: 012.5938670635. Signed URL: 000.2734949589
job 020 done. Query: 009.8888399601. Extract: 015.4054799080. Signed URL: 000.3903510571
job 022 done. Query: 012.9012901783. Extract: 013.9143507481. Signed URL: 000.3490731716
job 014 done. Query: 012.8500978947. Extract: 015.0055649281. Signed URL: 000.2981300354
job 006 done. Query: 011.6835210323. Extract: 016.2601530552. Signed URL: 000.2789318562
job 001 done. Query: 013.4435272217. Extract: 015.2819819450. Signed URL: 000.2984759808
job 005 done. Query: 012.0956349373. Extract: 018.9619371891. Signed URL: 000.3134548664
job 018 done. Query: 013.6754779816. Extract: 020.0537509918. Signed URL: 000.3496448994
job 011 done. Query: 011.9627509117. Extract: 025.1803772449. Signed URL: 000.3009829521
job 008 done. Query: 015.7373569012. Extract: 136.8249070644. Signed URL: 000.3158171177
job 023 done. Query: 013.7817242146. Extract: 148.2014479637. Signed URL: 000.4145238400
job 012 done. Query: 014.5390141010. Extract: 151.3171939850. Signed URL: 000.3226230145
job 007 done. Query: 014.1386809349. Extract: 160.1254091263. Signed URL: 000.2966897488
job 021 done. Query: 013.6751790047. Extract: 162.8383400440. Signed URL: 000.3162341118
job 019 done. Query: 013.5642910004. Extract: 163.2161693573. Signed URL: 000.2765989304
job 003 done. Query: 013.8807480335. Extract: 165.1014308929. Signed URL: 000.3309218884
job 024 done. Query: 013.5861997604. Extract: 182.0707099438. Signed URL: 000.3331830502
job 009 done. Query: 013.5025639534. Extract: 199.4397711754. Signed URL: 000.4156360626
job 015 done. Query: 013.7611100674. Extract: 230.2218120098. Signed URL: 000.2913899422
job 016 done. Query: 013.4659759998. Extract: 285.7284781933. Signed URL: 000.3109869957
job 017 done. Query: 019.2001299858. Extract: 322.5298812389. Signed URL: 000.2890429497
job 010 done. Query: 014.7132742405. Extract: 363.8596160412. Signed URL: 000.6748869419

工作做三件事

  1. 向BigQuery提交查询
  2. 将结果表提取到云端存储
  3. 在云端存储中生成blob的签名URL
  4. 结果显示,第一组提取物需要9到25秒,之后开始花费更长的时间。

    有关为何发生这种情况的任何想法?这是什么原因? https://cloud.google.com/storage/docs/request-rate 有没有办法解决这个问题?

    编辑:这是我发现的一些其他信息。

    | job | Local Extract timed | Google Extract timed | Google's Extract started | Google's Extract ended | Local Extract start | Local Extract start | 
    | --- | ------------------- | -------------------- | ------------------------ | ---------------------- | ------------------- | ------------------- |
    | 026 | 009.26328           | 008.84300            | 13:39:00.441000          | 13:39:09.284000        | 07:39:00.235970     | 07:39:09.498784     |
    | 009 | 011.52299           | 008.04000            | 13:39:00.441000          | 13:39:08.481000        | 07:39:00.234297     | 07:39:11.756788     |
    | 004 | 010.35730           | 008.66700            | 13:39:03.436000          | 13:39:12.103000        | 07:39:03.240466     | 07:39:13.597328     |
    | 011 | 011.86404           | 009.29900            | 13:39:03.055000          | 13:39:12.354000        | 07:39:02.893600     | 07:39:14.756887     |
    | 006 | 012.50416           | 011.75400            | 13:39:02.854000          | 13:39:14.608000        | 07:39:02.623032     | 07:39:15.126790     |
    | 000 | 013.30535           | 008.77000            | 13:39:02.056000          | 13:39:10.826000        | 07:39:01.863548     | 07:39:15.168434     |
    | 002 | 011.47199           | 008.53700            | 13:39:04.443000          | 13:39:12.980000        | 07:39:04.236455     | 07:39:15.708005     |
    | 032 | 015.68229           | 009.69200            | 13:39:02.915000          | 13:39:12.607000        | 07:39:02.768185     | 07:39:18.450160     |
    | 001 | 017.46480           | 009.35800            | 13:39:01.313000          | 13:39:10.671000        | 07:39:01.071540     | 07:39:18.535896     |
    | 012 | 019.02242           | 008.65700            | 13:39:00.903000          | 13:39:09.560000        | 07:39:00.727101     | 07:39:19.749070     |
    | 018 | 016.95632           | 009.75800            | 13:39:03.259000          | 13:39:13.017000        | 07:39:03.080580     | 07:39:20.036199     |
    | 019 | 017.24428           | 008.51100            | 13:39:03.773000          | 13:39:12.284000        | 07:39:03.575118     | 07:39:20.819042     |
    | 008 | 019.55018           | 009.83600            | 13:39:02.110000          | 13:39:11.946000        | 07:39:01.905548     | 07:39:21.455273     |
    | 023 | 016.64131           | 008.94500            | 13:39:05.282000          | 13:39:14.227000        | 07:39:05.041235     | 07:39:21.682086     |
    | 017 | 019.39104           | 007.12700            | 13:39:03.118000          | 13:39:10.245000        | 07:39:02.896256     | 07:39:22.286485     |
    | 020 | 019.96283           | 010.05000            | 13:39:03.115000          | 13:39:13.165000        | 07:39:02.942562     | 07:39:22.904864     |
    | 036 | 022.05831           | 010.51200            | 13:39:02.626000          | 13:39:13.138000        | 07:39:02.461061     | 07:39:24.518903     |
    | 024 | 028.39538           | 008.79600            | 13:39:05.151000          | 13:39:13.947000        | 07:39:04.916194     | 07:39:33.311248     |
    | 007 | 107.36010           | 010.68900            | 13:40:31.555000          | 13:40:42.244000        | 07:39:03.050049     | 07:40:50.409359     |
    | 028 | 120.63134           | 009.52400            | 13:40:49.915000          | 13:40:59.439000        | 07:39:02.941202     | 07:41:03.572094     |
    | 033 | 120.78268           | 009.54200            | 13:40:27.147000          | 13:40:36.689000        | 07:39:04.152378     | 07:41:04.934602     |
    | 037 | 122.64949           | 008.80400            | 13:40:33.298000          | 13:40:42.102000        | 07:39:06.500587     | 07:41:09.149629     |
    | 035 | 125.35254           | 009.13200            | 13:40:27.600000          | 13:40:36.732000        | 07:39:04.295941     | 07:41:09.647836     |
    | 015 | 139.13287           | 011.17800            | 13:40:27.116000          | 13:40:38.294000        | 07:39:03.406321     | 07:41:22.538701     |
    | 029 | 141.21037           | 008.23700            | 13:40:24.271000          | 13:40:32.508000        | 07:39:03.816588     | 07:41:25.026438     |
    | 013 | 145.94239           | 009.19400            | 13:40:33.809000          | 13:40:43.003000        | 07:39:03.375451     | 07:41:29.317454     |
    | 039 | 149.92807           | 009.72300            | 13:40:33.090000          | 13:40:42.813000        | 07:39:03.635156     | 07:41:33.562607     |
    | 016 | 166.26505           | 010.12000            | 13:40:39.999000          | 13:40:50.119000        | 07:39:03.383215     | 07:41:49.647907     |
    | 010 | 210.61908           | 011.37900            | 13:42:20.287000          | 13:42:31.666000        | 07:39:03.702486     | 07:42:34.321079     |
    | 027 | 227.83011           | 010.00900            | 13:42:25.845000          | 13:42:35.854000        | 07:39:02.953435     | 07:42:50.783106     |
    | 025 | 228.48326           | 009.71000            | 13:42:20.845000          | 13:42:30.555000        | 07:39:03.673122     | 07:42:52.155934     |
    | 022 | 244.57685           | 010.06900            | 13:42:53.712000          | 13:43:03.781000        | 07:39:03.963936     | 07:43:08.540307     |
    | 021 | 263.74717           | 009.81400            | 13:42:40.211000          | 13:42:50.025000        | 07:39:04.505016     | 07:43:28.251864     |
    | 031 | 273.96990           | 008.55100            | 13:43:18.645000          | 13:43:27.196000        | 07:39:03.618419     | 07:43:37.587862     |
    | 034 | 280.96174           | 010.53300            | 13:42:58.364000          | 13:43:08.897000        | 07:39:04.313498     | 07:43:45.274962     |
    | 030 | 281.76029           | 008.27100            | 13:42:49.448000          | 13:42:57.719000        | 07:39:03.832644     | 07:43:45.592592     |
    | 005 | 288.15577           | 009.85300            | 13:43:04.825000          | 13:43:14.678000        | 07:39:04.006553     | 07:43:52.161888     |
    | 003 | 296.52279           | 009.65300            | 13:43:24.041000          | 13:43:33.694000        | 07:39:03.831264     | 07:44:00.353715     |
    | 038 | 380.01783           | 008.45000            | 13:44:57.326000          | 13:45:05.776000        | 07:39:03.055733     | 07:45:23.073209     |
    | 014 | 397.05841           | 008.99800            | 13:44:48.577000          | 13:44:57.575000        | 07:39:03.132323     | 07:45:40.190302     |
    

    该表显示了我必须在本地等待运行工作的时间,并显示了Google完成工作所需的时间。从时代的角度来看,它表明Google不需要很长时间来执行提取,但它不会同时运行作业,因此会强制某些提取在开始之前等待几分钟。 / p>

1 个答案:

答案 0 :(得分:2)

您更正了,目前内部处理导出作业的速度有内部限制。这最初是为了保护太多长而昂贵的出口并行运行的系统。但是,正如您所指出的,在您的情况下,此限制似乎无法帮助您在1分钟内完成所有导出作业。

我们有一个开放(内部)错误来解决这个问题,以便像您这样的小型出口更好地处理这种情况。与此同时,如果您认为自己被此阻止,提交错误或让我知道您的项目ID,我们可以帮助您提高项目的限制。