Bigquery开始给我错误:今天早上运行此查询时内存不足。涉及的两个表包含不超过5GB的数据。另外,我使用餐桌装饰,1407249067530等于今天上午10:30(20140805)。我想知道问题是什么。
职位编号:red-road-574:job_x8flLfo4QwA1gQ_FCrNWbKY-bZM
select * from
(
select t_connection.row_id AS debug_row_id,
t_connection.hardware_id AS hardware_id,
t_connection.debug_data AS debug_data,
t_connection.connection_status AS connection_status,
t_connection.date_time AS debug_date_time,
t_gps.hardware_id AS hardware_id2,
t_gps.latitude AS latitude,
t_gps.longitude AS longitude,
t_gps.date_time AS gps_date_time,
t_gps.zip_code AS zip_code,
ROW_NUMBER() OVER (PARTITION BY debug_row_id ORDER BY time_diff) row_num,
from(
select *,
ABS(t_gps.date_time-t_connection.date_time) AS time_diff
from ( select CONCAT(String(gg.hardware_id),String(gg.date_time)) as row_id,
gg.hardware_id as hardware_id,
gg.latitude as latitude,
gg.longitude as longitude,
gg.date_time as date_time,
gg.zip_code as zip_code
from [my data set.table1_20140805@1407249067530-] gg
) AS t_gps
INNER JOIN EACH
( select CONCAT(CONCAT(String(dd.debug_reason),String(dd.hardware_id)),String(dd.date_time)) as row_id,
dd.hardware_id as hardware_id,
dd.date_time as date_time,
dd.debug_data as debug_data,
case
when dd.debug_reason = 1 then 'Successful_Connection'
when dd.debug_reason = 2 then 'Dropped_Connection'
when dd.debug_reason = 3 then 'Failed_Connection'
end AS connection_status
from [my data set.table2_20140805@1407249067530-] dd
where dd.debug_reason in (50013, 50017, 50018)
) as t_connection
ON t_connection.hardware_id = t_gps.hardware_id
)
) WHERE row_num=1
答案 0 :(得分:2)
你正在打一个奇怪的角落。当您使用allowLargeResults
结果嵌套或重复但未使用flattenResults=false
时,查询将进入特殊模式。 (当你使用时间戳时,你真的使用嵌套的数据结构,这是一个产生1000个错误的设计决定,并且很快就会改变)。这种特殊的查询模式有一些限制,这就是你所要达到的。
一般来说,我们希望这是无缝的,这就是没有记录的原因。但是,既然你在这里遇到了问题,我会解释一下如何避免它。
您有几种方法可以解决这个问题:
如果你使用嵌套或重复的结果(看起来你不是,这很好):
如果您在结果中使用时间戳:
如果你真的不需要大的结果:
我意识到所有这些选择都令人非常不满意。这是我们正在积极努力改进的领域。
答案 1 :(得分:0)
现在使用allowLargeReults = true和flattenResults = false并在第一步将时间戳转换为数值
select * from
(
select row_id AS debug_row_id,
hardware_id AS hardware_id,
debug_data AS debug_data,
connection_status AS connection_status,
date_time AS debug_date_time,
hardware_id2 AS hardware_id2,
latitude AS latitude,
longitude AS longitude,
date_time2 AS gps_date_time,
zip_code AS zip_code,
ROW_NUMBER() OVER (PARTITION BY debug_row_id ORDER BY time_diff) row_num,
from(
select *,
ABS(t_gps.date_time2-t_connection.date_time) AS time_diff
from ( select CONCAT(String(gg.hardware_id),String(gg.date_time)) as row_id_gps,
gg.hardware_id as hardware_id2,
gg.latitude as latitude,
gg.longitude as longitude,
TIMESTAMP_TO_MSEC(gg.date_time) as date_time2,
gg.zip_code as zip_code
from [test.gps32_20140805@1407249067530-] gg
) AS t_gps
INNER JOIN EACH
( select CONCAT(CONCAT(String(dd.debug_reason),String(dd.hardware_id)),String(dd.date_time)) as row_id,
dd.hardware_id as hardware_id,
TIMESTAMP_TO_MSEC(dd.date_time) as date_time,
dd.debug_data as debug_data,
case
when dd.debug_reason = 1 then 'Successful_Connection'
when dd.debug_reason = 2 then 'Dropped_Connection'
when dd.debug_reason = 3 then 'Failed_Connection'
end AS connection_status
from [test.debug_data_developer_20140805@1407249067530-] dd
where dd.debug_reason in (50013, 50017, 50018)
) as t_connection
ON t_connection.hardware_id = t_gps.hardware_id2
)
) WHERE row_num=1
它给了我
Query Failed
Error: Resources exceeded during query execution.
Job ID: red-road-574:job_ikWQvffmPEUP6DtTvJaYpXHFJ2M
答案 2 :(得分:0)
这是功能正常的SQL,其中allowLargeResults = true,flattenResults = true。我不知道我做了什么让这项工作,也许只添加一个HAVING条款?但在JOIN中,我将一侧改为整个表而不是如上所述的装饰器,因此涉及的数据实际上增加了。我不确定它是否可以保持成功,或者它只是暂时的运气。