BigQuery错误:连接错误。请再试一次

时间:2014-10-13 16:12:54

标签: google-bigquery

我尝试运行一个涉及加入5个表的大查询。查询需要每半小时运行一次,大部分时间在几秒钟内完成,有时需要几个小时,最糟糕的情况是它最后失败,错误是“连接错误,请再试一次”。我没有使用“加入每个”但只是因为性能原因“加入”而且基础数据集现在似乎不是太大了。

查询是:

select 
    '2014-10-13' as date,
    i.eventId as event_id,
    e.event_name as event_name,
    datediff('2014-10-13', e.event_start_date)+1 as event_day,
    i.impressionCts as total_viewers,
    round(i.top10Cts*100/i.impressionCts) as percent_pos_top10,
    round(i.ct1020*100/i.impressionCts) as percent_pos_1020,
    round(i.ct2030*100/i.impressionCts) as percent_pos_2030,
    round(i.ct3050*100/i.impressionCts) as percent_pos_3050,
    round(i.over50Cts*100/i.impressionCts) as percent_pos_50above,

    p.purchasers as purchasers,
    round(p.purchasers*100/i.impressionCts) as conversion_rate_view_to_purchase,
    p.total_units as total_purchased_units,
    round(p.total_demand,2) as total_purchased_demand,

    ck.total_clickiers as total_clickers,
    ck.total_clicks as total_clicks,
    round(ck.total_clickiers*100/i.impressionCts) as conversion_rate_view_to_click,
    round(ck.clickers_pos_top10*100/i.top10Cts) as conversion_rate_view_to_click_pos_top10,
    round(ck.clickers_pos_1020*100/i.ct1020) as conversion_rate_view_to_click_pos_1020,
    round(ck.clickers_pos_2030*100/i.ct2030) as conversion_rate_view_to_click_pos_2030,
    round(ck.clickers_pos_3050*100/i.ct3050) as conversion_rate_view_to_click_pos_3050,
    round(ck.clickers_pos_above50Cts/i.over50Cts) as conversion_rate_view_to_click_pos_50above,

    ca.cartadders as total_cartadders,
    ca.qty as total_qty_addedToCart,
    round(ca.cartadders*100/ck.total_clickiers) as conversion_rate_click_to_cartadd,

    round(p.purchasers*100/ca.cartadders) as conversion_rate_cartadd_to_purchase
from
(select integer(eventId) as eventId,count(distinct customerId) as impressionCts, 
        count(distinct if(position < 10, customerId,null) ) as top10Cts,
        count(distinct if(position between 10 and 19, customerId,null)) as ct1020,  
        count(distinct if(position between 20 and 29, customerId,null)) as ct2030,
        count(distinct if(position between 30 and 49, customerId,null)) as ct3050,
        count(distinct if(position >= 50, customerId,null)) as over50Cts
from clickstream.event_impression_20141013 group each by eventId) i
join 
(select magenta_event_id,event_name,event_start_date,event_end_date from zudw.event where date(event_start_date)<='2014-10-13' and  date(event_end_date)>='2014-10-13') e
on e.magenta_event_id = i.eventId
left join 
(select 
        integer(eventId) as eventId,count(distinct customerId) as total_clickiers, count(customerId) as total_clicks,
        count(distinct if(position < 10, customerId,null) ) as clickers_pos_top10,
        count(distinct if(position between 10 and 19, customerId,null)) as clickers_pos_1020,  
        count(distinct if(position between 20 and 29, customerId,null)) as clickers_pos_2030,
        count(distinct if(position between 30 and 49, customerId,null)) as clickers_pos_3050,
        count(distinct if(position >= 50, customerId,null)) as clickers_pos_above50Cts        
 from clickstream.click_20141013 where type='event' 
 and (refererUrl is null or refererUrl='http://www.zulily.com/' or instr(refererUrl,'.zulily.com/new-today/')>0 or instr(refererUrl,'.zulily.com/newToday')>0 
      or instr(refererUrl,'.zulily.com/endsSoon')>0 or instr(refererUrl,'.zulily.com/toys')>0 or instr(refererUrl,'.zulily.com/ready')>0
      or instr(refererUrl,'.zulily.com/lastDay/')>0 or instr(refererUrl,'.zulily.com/shopByAge/')>0 or instr(refererUrl,'.zulily.com/shopByCategory/')>0 
      or instr(refererUrl,'.zulily.com/girls')>0 or instr(refererUrl,'.zulily.com/boys')>0 or instr(refererUrl,'.zulily.com/women')>0 or instr(refererUrl,'.zulily.com/men')>0 
      or instr(refererUrl,'.zulily.com/home')>0 or instr(refererUrl,'.zulily.com/shoes')>0 or instr(refererUrl,'.zulily.com/health')>0 or instr(refererUrl,'.zulily.com/baby')>0)
 group each by eventId
) ck on i.eventId=ck.eventId
left join  
(select integer(eventId) as eventId, count(distinct customerId) as cartadders, sum(qty) as qty
 from clickstream.cartadd_20141013 group each by eventId
) ca on i.eventId=ca.eventId
left join  
(select integer(eventId) as eventId,
        count(distinct customerId) as purchasers, 
        sum(qty) as total_units,
        sum(price) as total_demand
        from clickstream.purchase_20141013
        group each by eventId
) p on i.eventId = p.eventId
where date(e.event_start_date)<='2014-10-13'

1 个答案:

答案 0 :(得分:0)

有几件事:

  1. 如果您遇到紧急问题,通常最好联系支持部门。 StackOverflow往往由不经常办理登机手续的工程师以及可能无法提供紧急请求的其他社区成员填充。
  2. 如果您提供了一个职位ID(带有项目ID),那么查找查询结果会更容易。
  3. 如果您的非混乱连接(如在JOIN EACH中)查询需要很长时间,那么您可以执行一些其他操作来调整它们。
    1. 尝试使用JOIN EACH。虽然我发现JOIN EACH目前存在间歇性问题,但如果您查看了大约几小时的查询,那么您可能会遇到JOIN EACH可以提供帮助的资源限制。
    2. 确认您没有JOIN爆炸。也就是说,确保左侧的每个键只匹配右侧的一个键。有时候人们会认为是这种情况,但可能会有一个空值或虚拟值,可能导致多次匹配。
    3. 尝试在子选择内推送where子句,以便在连接之前发生过滤,如果可能的话。
    4. 尝试确保右侧表格(即原始FROM子句中不包含的所有表格)都是较小的表格。
  4. 有关您的工作的更多信息,我可以查看日志中实际发生的事情,看看为什么会出现连接错误以及之前的查询太久。