Bigquery:"内存不足"

时间:2014-08-05 19:32:54

标签: google-bigquery

Bigquery开始给我错误:今天早上运行此查询时内存不足。涉及的两个表包含不超过5GB的数据。另外,我使用餐桌装饰,1407249067530等于今天上午10:30(20140805)。我想知道问题是什么。

职位编号:red-road-574:job_x8flLfo4QwA1gQ_FCrNWbKY-bZM

  select * from 
                (                           
                select  t_connection.row_id AS debug_row_id,                        
                    t_connection.hardware_id AS hardware_id,                        
                    t_connection.debug_data AS debug_data,                      
                    t_connection.connection_status AS connection_status,                        
                    t_connection.date_time AS debug_date_time,                      
                    t_gps.hardware_id AS hardware_id2,                      
                    t_gps.latitude AS latitude,                     
                    t_gps.longitude AS longitude,                       
                    t_gps.date_time AS gps_date_time,                       
                     t_gps.zip_code AS zip_code,
                    ROW_NUMBER() OVER (PARTITION BY debug_row_id ORDER BY time_diff) row_num,                       
                    from(                           
                          select    *,                      
                                ABS(t_gps.date_time-t_connection.date_time) AS time_diff                    
                              from ( select CONCAT(String(gg.hardware_id),String(gg.date_time)) as row_id,                      
                                    gg.hardware_id as hardware_id,              
                                    gg.latitude as latitude,                
                                    gg.longitude as longitude,              
                                    gg.date_time as date_time,              
                                     gg.zip_code as zip_code                        
                                     from   [my data set.table1_20140805@1407249067530-] gg                         

                                   ) AS t_gps                           

                                    INNER JOIN EACH                     

                                  ( select  CONCAT(CONCAT(String(dd.debug_reason),String(dd.hardware_id)),String(dd.date_time)) as row_id,                      
                                        dd.hardware_id as hardware_id,                  
                                        dd.date_time as date_time,                      
                                        dd.debug_data as debug_data,                    
                                case                    
                                    when dd.debug_reason = 1 then 'Successful_Connection'               
                                    when dd.debug_reason = 2 then 'Dropped_Connection'              
                                    when dd.debug_reason = 3 then 'Failed_Connection'               
                                end AS connection_status                                                
                                    from    [my data set.table2_20140805@1407249067530-] dd         
                                    where   dd.debug_reason in (50013, 50017, 50018)    

                                ) as t_connection                           

                                 ON t_connection.hardware_id = t_gps.hardware_id                    
                )                           
               )  WHERE row_num=1

3 个答案:

答案 0 :(得分:2)

你正在打一个奇怪的角落。当您使用allowLargeResults结果嵌套或重复但未使用flattenResults=false时,查询将进入特殊模式。 (当你使用时间戳时,你真的使用嵌套的数据结构,这是一个产生1000个错误的设计决定,并且很快就会改变)。这种特殊的查询模式有一些限制,这就是你所要达到的。

一般来说,我们希望这是无缝的,这就是没有记录的原因。但是,既然你在这里遇到了问题,我会解释一下如何避免它。

您有几种方法可以解决这个问题:

  1. 如果你使用嵌套或重复的结果(看起来你不是,这很好):

    • 重命名结果,名称中没有圆点。
    • 将查询中的flattenResults字段设置为“false”。这意味着嵌套和重复的字段实际上将嵌套并在结果中重复。
  2. 如果您在结果中使用时间戳:

    • 将时间戳转换为字符串或数值。遗憾。
  3. 如果你真的不需要大的结果:

    • 取消设置allowLargeResults标志。
  4. 我意识到所有这些选择都令人非常不满意。这是我们正在积极努力改进的领域。

答案 1 :(得分:0)

现在使用allowLargeReults = true和flattenResults = false并在第一步将时间戳转换为数值

  select * from 
                (                           
                select  row_id AS debug_row_id,                     
                    hardware_id AS hardware_id,                     
                    debug_data AS debug_data,                       
                    connection_status AS connection_status,                     
                    date_time AS debug_date_time,                       
                    hardware_id2 AS hardware_id2,                       
                    latitude AS latitude,                       
                    longitude AS longitude,                     
                    date_time2 AS gps_date_time,                        
                    zip_code AS zip_code,
                    ROW_NUMBER() OVER (PARTITION BY debug_row_id ORDER BY time_diff) row_num,                       
                    from(                           
                          select    *,                      
                                ABS(t_gps.date_time2-t_connection.date_time) AS time_diff                   
                              from ( select CONCAT(String(gg.hardware_id),String(gg.date_time)) as row_id_gps,                      
                                    gg.hardware_id as hardware_id2,                 
                                    gg.latitude as latitude,                
                                    gg.longitude as longitude,              
                                    TIMESTAMP_TO_MSEC(gg.date_time) as date_time2,              
                                     gg.zip_code as zip_code                        
                                     from   [test.gps32_20140805@1407249067530-] gg                         

                                   ) AS t_gps                           

                                    INNER JOIN EACH                     

                                  ( select  CONCAT(CONCAT(String(dd.debug_reason),String(dd.hardware_id)),String(dd.date_time)) as row_id,                      
                                        dd.hardware_id as hardware_id,                  
                                        TIMESTAMP_TO_MSEC(dd.date_time) as date_time,                       
                                        dd.debug_data as debug_data,                    
                                case                    
                                    when dd.debug_reason = 1 then 'Successful_Connection'               
                                    when dd.debug_reason = 2 then 'Dropped_Connection'              
                                    when dd.debug_reason = 3 then 'Failed_Connection'               
                                end AS connection_status                                                
                                    from    [test.debug_data_developer_20140805@1407249067530-] dd      
                                    where   dd.debug_reason in (50013, 50017, 50018)

                                ) as t_connection                           

                                 ON t_connection.hardware_id = t_gps.hardware_id2                   
                )                           
               )  WHERE row_num=1                               

它给了我

Query Failed
Error: Resources exceeded during query execution.
Job ID: red-road-574:job_ikWQvffmPEUP6DtTvJaYpXHFJ2M

答案 2 :(得分:0)

这是功能正常的SQL,其中allowLargeResults = true,flattenResults = true。我不知道我做了什么让这项工作,也许只添加一个HAVING条款?但在JOIN中,我将一侧改为整个表而不是如上所述的装饰器,因此涉及的数据实际上增加了。我不确定它是否可以保持成功,或者它只是暂时的运气。 enter image description here