BigQuery - 查找所有记录的数据

时间:2018-05-08 07:31:58

标签: sql firebase google-bigquery

我已经将firebase导入BigQuery。

我想要做的是,找到始终执行某些事件的特定设备(firebase交互记录)。这意味着,每当这些设备记录在firebase中时,event_dim.name将至少包含该事件类型的一个条目。

例如,请考虑以下查询,其中包含来自(Link)的示例数据:

#standardSQL
SELECT 
  user_dim.app_info.app_instance_id,
  event_dim
FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160607`

假设这有

等数据
+------------------+--------------------+
| app_instance_id  | event_dim.name     |
+------------------+--------------------+
| 1234             | os_update          |
|                  | initialized_rh_api |
+------------------+--------------------+
| 1234             | os_update          |
+------------------+--------------------+
| 5678             | os_update          |
|                  | initialized_rh_api |
+------------------+--------------------+
| 5678             | other_action       |
+------------------+--------------------+

我想进行查询以获取各个'app_instance_id'的列表,其中event_dim.name包含'os_update'。 根据此标准,对于上述项目, 1234 会匹配,但 5678 则不会。

感谢。可能很简单,但我找不到办法。我可以找到包含该条目的每条记录,但无法消除没有该条目的条目。

3 个答案:

答案 0 :(得分:1)

我会使用聚合:

SELECT user_dim.app_info.app_instance_id    FROM `firebase-analytics-sample-data.ios_dataset.app_events_20160607`
GROUP BY user_dim.app_info.app_instance_id
HAVING SUM(CASE WHEN event_dim.name NOT LIKE '%os_update%' THEN 1 ELSE 0 END) = 0;

HAVING子句计算不匹配的事件数。 = 0表示没有。

答案 1 :(得分:0)

我在Oracle数据库中使用正则表达式和自联接。请查看以下示例。


    CREATE TABLE EVENTS (app_instance_id NUMBER, event_dim_name VARCHAR2(100));
    --- Sample record
    INSERT INTO EVENTS VALUES(1234,'os_update initialized_rh_api');
    INSERT INTO EVENTS VALUES(1234,'os_update');
    INSERT INTO EVENTS VALUES(5678,'os_update initialized_rh_api');
    INSERT INTO EVENTS VALUES(5678,'other_action');
    INSERT INTO EVENTS VALUES(7895,'os_update initialized_rh_api');
    INSERT INTO EVENTS VALUES(7895,'os_update');
    INSERT INTO EVENTS VALUES(4567,'os_update initialized_rh_api');
    INSERT INTO EVENTS VALUES(4567,'other_action');

    -- Sample Query

    SELECT EV.APP_INSTANCE_ID,
      EV.EVENT_DIM_NAME
    FROM
      (SELECT DISTINCT app_instance_id,
        regexp_substr(event_dim_name,'^[os_update]+', 1, level) AS"event_dim_name"
      FROM EVENTS
        CONNECT BY regexp_substr(event_dim_name, '^[os_update]+', 1, level) IS NOT NULL
      )TEMP,
      EVENTS EV
        WHERE EV.APP_INSTANCE_ID = TEMP.app_instance_id
        AND EV.EVENT_DIM_NAME    = TEMP."event_dim_name";

答案 2 :(得分:0)

下面是BigQuery Standard SQL,并返回所有app_instances

中显示的所有名称    
#standardSQL
SELECT app_instance_id, name
FROM (
  SELECT app_instance_id, COUNT(1) cnt,
    ARRAY_CONCAT_AGG(names) names
  FROM (
    SELECT user_dim.app_info.app_instance_id, 
      ARRAY(SELECT DISTINCT name FROM UNNEST(event_dim) dim) names   
    FROM `project.dataset.your_table`
  )
  GROUP BY app_instance_id
), UNNEST(names) name
GROUP BY app_instance_id, name
HAVING COUNT(1) = ANY_VALUE(cnt) 

如果您将针对您问题中的虚拟数据运行它,如下所示

#standardSQL
WITH `project.dataset.your_table` AS (
  SELECT STRUCT<app_info STRUCT<app_instance_id STRING>>(STRUCT('1234')) user_dim, [STRUCT<name STRING>('os_update'), STRUCT('initialized_rh_api')] event_dim UNION ALL
  SELECT STRUCT(STRUCT('1234')) user_dim, [STRUCT<name STRING>('os_update')] event_dim UNION ALL
  SELECT STRUCT(STRUCT('5678')) user_dim, [STRUCT<name STRING>('os_update'), STRUCT('initialized_rh_api')] event_dim UNION ALL
  SELECT STRUCT(STRUCT('5678')) user_dim, [STRUCT<name STRING>('other_action')] event_dim 
)
SELECT app_instance_id, name
FROM (
  SELECT app_instance_id, COUNT(1) cnt,
    ARRAY_CONCAT_AGG(names) names
  FROM (
    SELECT user_dim.app_info.app_instance_id, 
      ARRAY(SELECT DISTINCT name FROM UNNEST(event_dim) dim) names   
    FROM `project.dataset.your_table`
  )
  GROUP BY app_instance_id
), UNNEST(names) name
GROUP BY app_instance_id, name
HAVING COUNT(1) = ANY_VALUE(cnt)  

你会得到理想的结果

Row app_instance_id name     
1   1234            os_update