基于使用左连接连接的3个表进行分组

时间:2016-05-26 15:40:08

标签: sql google-bigquery

以下是我的查询,该查询基于3个表的左连接来获取字段。我的要求是根据表Debug.T中最近的SystemDateTime获取所有字段。例如,如果我尝试使用HardwareId = 550803413,它将返回2条带有2个不同SystemDateTime的记录。我需要对其进行过滤,以便根据最近的SystemDateTime获得所有HardwareIds的1条记录。数据存储在Google Big Query中。

任何帮助都将不胜感激。

SELECT HardwareId, e.Carrier, max(d.SystemDateTime) as   DateTime,
CASE
  WHEN lower(DebugData) LIKE 'veri%' THEN 'Verizon'
  WHEN REGEXP_MATCH(lower(DebugData),'\\d+') THEN c.Network
END
AS ActualData 
FROM (
SELECT 
HardwareId, SystemDateTime, max(SystemDateTime) as max_date,
INTEGER(RTRIM(SUBSTR(REGEXP_REPLACE(REGEXP_REPLACE(DebugData,'\\"',' '), '\\?',' ') ,0,3))) AS d1, 
INTEGER(RTRIM(SUBSTR(REGEXP_REPLACE(DebugData,'[^a-zA-Z0-9]',' '),4,LENGTH(DebugData)-3))) AS d2
FROM TABLE_DATE_RANGE([Debug.T],TIMESTAMP('2016-05-16'),TIMESTAMP('2016-05-16'))
GROUP BY HardwareId, DebugReason, DebugData, SystemDateTime
HAVING DebugReason = 31) AS d
LEFT JOIN
(
  SELECT Mcc, Mnc as Mnc, Network from [Debug.Carrier]
) As c
ON c.Mcc = d.d1 and c.Mnc = d.d2
INNER JOIN
(
  SELECT VehicleId, APNCarrier FROM [Info_20160516]
) As e
ON d.HardwareId = e.VehicleId
GROUP BY HardwareId, ActualData, e.Carrier
HAVING HardwareId = 550803413

当前输出:

HardwareId  DebugReason DebugData   e_APNCarrier    DateTime    ActualDebugData
550473814   50013   23430"? Unknown 2016-05-16 08:09:09.534597  Everyth. Ev.wh./T-Mobile
550473814   50013   23410"? Unknown 2016-05-16 07:50:48.526288  O2 Ltd.
550473814   50013   23415"? Unknown 2016-05-16 23:54:37.487154  Vodafone

预期产出:

由于最近的SystemDateTime是23:54:37.487154,查询应根据最近的SystemDateTime过滤记录并提供结果。

HardwareId  DebugReason DebugData   e_APNCarrier    DateTime    ActualDebugData
550473814   50013   23415"? Unknown 2016-05-16 23:54:37.487154  Vodafone

1 个答案:

答案 0 :(得分:0)

所以你只想根据HardwareIdDateTime获得最新记录?试试这个:

SELECT * FROM (
SELECT HardwareId, e.Carrier, d.SystemDateTime as   DateTime,
CASE
  WHEN lower(DebugData) LIKE 'veri%' THEN 'Verizon'
  WHEN REGEXP_MATCH(lower(DebugData),'\\d+') THEN c.Network
END
AS ActualData,
ROW_NUMBER() OVER (PARTITION BY HARDWAREID ORDER BY d.SystemDateTime desc) RN 
FROM (
SELECT 
HardwareId, SystemDateTime, max(SystemDateTime) as max_date,
INTEGER(RTRIM(SUBSTR(REGEXP_REPLACE(REGEXP_REPLACE(DebugData,'\\"',' '), '\\?',' ') ,0,3))) AS d1, 
INTEGER(RTRIM(SUBSTR(REGEXP_REPLACE(DebugData,'[^a-zA-Z0-9]',' '),4,LENGTH(DebugData)-3))) AS d2
FROM TABLE_DATE_RANGE([Debug.T],TIMESTAMP('2016-05-16'),TIMESTAMP('2016-05-16'))
GROUP BY HardwareId, DebugReason, DebugData, SystemDateTime
HAVING DebugReason = 31) AS d
LEFT JOIN
(
  SELECT Mcc, Mnc as Mnc, Network from [Debug.Carrier]
) As c
ON c.Mcc = d.d1 and c.Mnc = d.d2
INNER JOIN
(
  SELECT VehicleId, APNCarrier FROM [Info_20160516]
) As e
ON d.HardwareId = e.VehicleId
HAVING HardwareId = 550803413
)
WHERE RN = 1