Google BigQuery交叉加入

时间:2016-05-17 12:32:42

标签: google-bigquery cross-join

我使用交叉连接来访问2个表中的数据。但是通过交叉连接,我得到错误“d.DebugData not found in table”bigdata:RawDebug.CarrierDetails“。任何帮助都将不胜感激!!

SELECT 
HardwareId, DebugReason, DebugData, 
CASE
  WHEN REGEXP_MATCH(DebugData,'\\d+') THEN c.Network
  ELSE REGEXP_REPLACE(DebugData,'\\?',' ')
END
as ActualDebugData 
FROM(
 SELECT 
 HardwareId, DebugReason, DebugData
 FROM TABLE_DATE_RANGE([bigdata:RawDebug.T],TIMESTAMP ('2016-05-15'),TIMESTAMP('2016-05-15'))
 WHERE Reason = 500  
 ) as d
 CROSS JOIN (
   SELECT Network 
   FROM [bigdata:RawDebug.CarrierDetails] 
   WHERE Mcc = substr(d.DebugData,0,3) AND Mnc =   substr(d.DebugData,4,LENGTH(d.Reason - 1)) 
   LIMIT 1 
 ) AS c

试过这个但是我得到了这个错误:“ON子句必须是AND =每个表中一个字段名称的比较,所有字段名称都以表名为前缀。”

 %%sql --module Test2
 DEFINE QUERY Test2
 SELECT 
 HardwareId, DebugReason, DebugData, 
 CASE
  WHEN REGEXP_MATCH(DebugData,'\\d+') THEN c.Network
  ELSE REGEXP_REPLACE(DebugData,'\\?',' ')
 END AS ActualDebugData 
 FROM (
 SELECT 
   HardwareId, DebugReason, DebugData, 
   SUBSTR(DebugData,0,3) AS d1, REGEXP_REPLACE(SUBSTR(DebugData,3,LENGTH(DebugData)-1),'%[^a-zA-Z0-9, ]%',' ') as d2
   FROM TABLE_DATE_RANGE([bigdata:RawDebug.T],TIMESTAMP('2016-05-15'),TIMESTAMP('2016-05-15'))
   WHERE DebugReason = 500  
   ) AS d
   LEFT JOIN (
   SELECT 
    Network, Mcc, Mnc
    ,ROW_NUMBER() OVER(PARTITION BY Mcc, Mnc) AS pos 
    FROM [bigdata:RawDebug.CarrierDetails] 
    ) AS c
    ON c.Mcc = INTEGER(d.d1) AND c.Mnc = INTEGER(d.d2)  
    WHERE c.pos = 1

我正在添加以下结构:

 RawDebug:
 HardwareId   DebugReason   DebugData   
 550029358    50013            VER%     
 550029359    50013            RO%      
 550029360    50013            34020?   
 550029361    50013            34021?

当DebugData有字符时,我有case语句匹配它,当它有数字时,我必须取前3个字符的子字符串,并将其与Carrierdetails中的Mcc和剩余字符匹配,并与Carrierdetails中的Mnc匹配

使用最近的查询,它不会考虑所有情况。相反,它需要一个特定的数字,并对所有行使用tat ActualDebugData。

1 个答案:

答案 0 :(得分:1)

SELECT 
  HardwareId, DebugReason, DebugData, 
  CASE
    WHEN REGEXP_MATCH(DebugData,'\\d+') THEN c.Network
    ELSE REGEXP_REPLACE(DebugData,'\\?',' ')
  END AS ActualDebugData 
FROM (
  SELECT 
    HardwareId, DebugReason, DebugData, 
    SUBSTR(DebugData,0,3) AS d1, SUBSTR(DebugData,4,LENGTH(Reason - 1)) AS d2 
  FROM TABLE_DATE_RANGE([bigdata:RawDebug.T],TIMESTAMP('2016-05-15'),TIMESTAMP('2016-05-15'))
  WHERE Reason = 500  
) AS d
LEFT JOIN (
  SELECT 
    Network, Mcc, Mnc
    //,ROW_NUMBER() OVER(PARTITION BY Mcc, Mnc) AS pos 
  FROM [bigdata:RawDebug.CarrierDetails] 
) AS c
ON c.Mcc = d.d1 AND c.Mnc = d.d2  
//WHERE c.pos = 1 

如果保证network对于d中的每个条目都是唯一的 - 您可以删除评论的行 否则你应该取消注释它们