Question

Bigquery开始给我错误：今天早上运行此查询时内存不足。涉及的两个表包含不超过5GB的数据。另外，我使用餐桌装饰，1407249067530等于今天上午10:30（20140805）。我想知道问题是什么。

职位编号：red-road-574：job_x8flLfo4QwA1gQ_FCrNWbKY-bZM

  select * from 
                (                           
                select  t_connection.row_id AS debug_row_id,                        
                    t_connection.hardware_id AS hardware_id,                        
                    t_connection.debug_data AS debug_data,                      
                    t_connection.connection_status AS connection_status,                        
                    t_connection.date_time AS debug_date_time,                      
                    t_gps.hardware_id AS hardware_id2,                      
                    t_gps.latitude AS latitude,                     
                    t_gps.longitude AS longitude,                       
                    t_gps.date_time AS gps_date_time,                       
                     t_gps.zip_code AS zip_code,
                    ROW_NUMBER() OVER (PARTITION BY debug_row_id ORDER BY time_diff) row_num,                       
                    from(                           
                          select    *,                      
                                ABS(t_gps.date_time-t_connection.date_time) AS time_diff                    
                              from ( select CONCAT(String(gg.hardware_id),String(gg.date_time)) as row_id,                      
                                    gg.hardware_id as hardware_id,              
                                    gg.latitude as latitude,                
                                    gg.longitude as longitude,              
                                    gg.date_time as date_time,              
                                     gg.zip_code as zip_code                        
                                     from   [my data set.table1_20140805@1407249067530-] gg                         

                                   ) AS t_gps                           

                                    INNER JOIN EACH                     

                                  ( select  CONCAT(CONCAT(String(dd.debug_reason),String(dd.hardware_id)),String(dd.date_time)) as row_id,                      
                                        dd.hardware_id as hardware_id,                  
                                        dd.date_time as date_time,                      
                                        dd.debug_data as debug_data,                    
                                case                    
                                    when dd.debug_reason = 1 then 'Successful_Connection'               
                                    when dd.debug_reason = 2 then 'Dropped_Connection'              
                                    when dd.debug_reason = 3 then 'Failed_Connection'               
                                end AS connection_status                                                
                                    from    [my data set.table2_20140805@1407249067530-] dd         
                                    where   dd.debug_reason in (50013, 50017, 50018)    

                                ) as t_connection                           

                                 ON t_connection.hardware_id = t_gps.hardware_id                    
                )                           
               )  WHERE row_num=1

Answer 1

你正在打一个奇怪的角落。当您使用allowLargeResults结果嵌套或重复但未使用flattenResults=false时，查询将进入特殊模式。（当你使用时间戳时，你真的使用嵌套的数据结构，这是一个产生1000个错误的设计决定，并且很快就会改变）。这种特殊的查询模式有一些限制，这就是你所要达到的。

一般来说，我们希望这是无缝的，这就是没有记录的原因。但是，既然你在这里遇到了问题，我会解释一下如何避免它。

您有几种方法可以解决这个问题：

如果你使用嵌套或重复的结果（看起来你不是，这很好）：
- 重命名结果，名称中没有圆点。
- 将查询中的flattenResults字段设置为“false”。这意味着嵌套和重复的字段实际上将嵌套并在结果中重复。
如果您在结果中使用时间戳：
- 将时间戳转换为字符串或数值。遗憾。
如果你真的不需要大的结果：
- 取消设置allowLargeResults标志。

我意识到所有这些选择都令人非常不满意。这是我们正在积极努力改进的领域。

Answer 2

现在使用allowLargeReults = true和flattenResults = false并在第一步将时间戳转换为数值

  select * from 
                (                           
                select  row_id AS debug_row_id,                     
                    hardware_id AS hardware_id,                     
                    debug_data AS debug_data,                       
                    connection_status AS connection_status,                     
                    date_time AS debug_date_time,                       
                    hardware_id2 AS hardware_id2,                       
                    latitude AS latitude,                       
                    longitude AS longitude,                     
                    date_time2 AS gps_date_time,                        
                    zip_code AS zip_code,
                    ROW_NUMBER() OVER (PARTITION BY debug_row_id ORDER BY time_diff) row_num,                       
                    from(                           
                          select    *,                      
                                ABS(t_gps.date_time2-t_connection.date_time) AS time_diff                   
                              from ( select CONCAT(String(gg.hardware_id),String(gg.date_time)) as row_id_gps,                      
                                    gg.hardware_id as hardware_id2,                 
                                    gg.latitude as latitude,                
                                    gg.longitude as longitude,              
                                    TIMESTAMP_TO_MSEC(gg.date_time) as date_time2,              
                                     gg.zip_code as zip_code                        
                                     from   [test.gps32_20140805@1407249067530-] gg                         

                                   ) AS t_gps                           

                                    INNER JOIN EACH                     

                                  ( select  CONCAT(CONCAT(String(dd.debug_reason),String(dd.hardware_id)),String(dd.date_time)) as row_id,                      
                                        dd.hardware_id as hardware_id,                  
                                        TIMESTAMP_TO_MSEC(dd.date_time) as date_time,                       
                                        dd.debug_data as debug_data,                    
                                case                    
                                    when dd.debug_reason = 1 then 'Successful_Connection'               
                                    when dd.debug_reason = 2 then 'Dropped_Connection'              
                                    when dd.debug_reason = 3 then 'Failed_Connection'               
                                end AS connection_status                                                
                                    from    [test.debug_data_developer_20140805@1407249067530-] dd      
                                    where   dd.debug_reason in (50013, 50017, 50018)

                                ) as t_connection                           

                                 ON t_connection.hardware_id = t_gps.hardware_id2                   
                )                           
               )  WHERE row_num=1

它给了我

Query Failed
Error: Resources exceeded during query execution.
Job ID: red-road-574:job_ikWQvffmPEUP6DtTvJaYpXHFJ2M

Answer 3

这是功能正常的SQL，其中allowLargeResults = true，flattenResults = true。我不知道我做了什么让这项工作，也许只添加一个HAVING条款？但在JOIN中，我将一侧改为整个表而不是如上所述的装饰器，因此涉及的数据实际上增加了。我不确定它是否可以保持成功，或者它只是暂时的运气。 enter image description here

Bigquery：＆＃34;内存不足＆＃34;

3 个答案: