了解自我联接和展平

时间:2017-08-08 17:50:44

标签: google-bigquery

我首先要知道我是新手并且我设法将这个原始查询一起破解。我查看过很多例子,但我并没有把自己的脑袋缠绕在自我连接上并显示我想看到的数据。

我每天都会向BQ提供移动应用数据,因此我会查询多个表格。我试图通过IMEI按日期查询致命碰撞计数。这个查询确实给了我想要的大部分输出,因为它返回Date,IMEI和Count。

但是,我希望输出为Date,IMEI,Branch,Truck和Count。 user_dim.user_properties.key是一个嵌套字段,在我的查询中,我特别要求user_dim.user_properties.key =' imei_id'并在user_dim.user_properties.value.value.string_value中获取它的值。

我不明白我将如何执行联接以获取user_dim.user_properties.key =' truck_id'和user_dim.user_properties.key =' branch_id'并最终得到我的输出:日期,IMEI,分支,卡车和计数在一行。

感谢您的帮助。

SELECT
  event_dim.date AS Date,
  user_dim.user_properties.value.value.string_value AS IMEI,
COUNT(*) AS Count
FROM
    FLATTEN( (
    SELECT
      *
    FROM
  TABLE_QUERY([smarttruck-6d137:com_usiinc_android_ANDROID],'table_id CONTAINS "app_events_"')), user_dim.user_properties)
WHERE
  user_dim.user_properties.key = 'imei_id'
  AND event_dim.name = 'app_exception'
  AND event_dim.params.key = 'fatal'
  AND event_dim.params.value.int_value = 1
  AND event_dim.date = '20170807'
GROUP BY
  Date,
  IMEI
ORDER BY
  Count DESC

1 个答案:

答案 0 :(得分:1)

以下是一个适合您的查询,使用standard SQL

.../nifi-assembly/target/nifi-1.4.0-SNAPSHOT-bin/nifi-1.4.0-SNAPSHOT/TRUSTSTORE_LOCATION

但有几点想法/建议:

  • 要限制扫描的数据量,您可能希望过滤#standardSQL SELECT event_dim.date AS Date, (SELECT value.value.string_value FROM UNNEST(user_dim.user_properties) WHERE key = 'imei_id') AS IMEI, (SELECT value.value.string_value FROM UNNEST(user_dim.user_properties) WHERE key = 'branch_id') AS branch_id, (SELECT value.value.string_value FROM UNNEST(user_dim.user_properties) WHERE key = 'truck_id') AS truck_id, COUNT(*) AS Count FROM `smarttruck-6d137.com_usiinc_android_ANDROID.app_events_*` CROSS JOIN UNNEST(event_dim) AS event_dim WHERE event_dim.name = 'app_exception' AND EXISTS ( SELECT 1 FROM UNNEST(event_dim.params) WHERE key = 'fatal' AND value.int_value = 1 ) AND event_dim.date = '20170807' GROUP BY Date, IMEI, branch_id, truck_id ORDER BY Count DESC; 而不是_TABLE_SUFFIX = '20170807'。这将更便宜,(如果我理解正确的话)将返回相同的结果。
  • 如果IMEI,branch_id和truck_id的组合是唯一的,那么计算计数可能没有任何好处,因此您可以删除event_dim.date = '20170807'以及COUNT(*) / {{ 1}}条款。