Bigquery event analytics join in subselect statement

时间:2016-07-11 19:50:08

标签: subquery left-join google-bigquery

I am attempting to return a query result from bigquery that returns the number of events that occur by session. I have been referring to the following article:

http://developer.streak.com/2013/11/using-google-bigquery-for-event-tracking.html

The database schema is pretty simple [sessionId, eventType, createdAt] The returning result set would be similar to an event workflow in google analytics. Something like [sessionId, num_event1, num_event2, ...]

The approach is to generate sub queries by event type and timestamp and then create additional subqueries that join the results from each of the event subqueries. I am able to do Step1, step2, step3 subqueries in isolation:

SELECT COUNT(first_event_timestamp) AS number_first_events,
       COUNT(second_event_timestamp) AS number_second_events,
       COUNT(third_event_timestamp) AS number_third_events
FROM

(SELECT eventUid AS eventUid1,
        createdAt AS timestamp1
 FROM [events_table]
 WHERE eventType = 'first-event') step1,

 (SELECT eventUid AS eventUid2,
        createdAt AS timestamp2
  FROM [events_table]
  WHERE eventType = 'second-event') step2,

 (SELECT
    eventUid as sessionId3,
    createdAt as timestamp3         
  FROM
    [events_table]         
  WHERE
    eventType = "third_event") step3

Adding steps1_2, steps1_2_3 is where i am hitting a wall. I get an error that the dataset name is missing from the table. Here is the full query:

SELECT COUNT(first_event_timestamp) AS num_first,
       COUNT(second_event_timestamp) AS num_second,
       COUNT(third_event_timestamp) AS num_third
FROM (SELECT
             sessionId
             first_event_timestamp,
             second_event_timestamp,
             third_event_timestamp
      FROM steps1_2_3
      GROUP BY sessionId),

      (SELECT
            sessionId AS sessionId1,
            createdAt AS timestamp1         
         FROM
            [events_table]         
         WHERE
            eventType = "first_event") step1,           (SELECT
            eventUid AS sessionId2,
            createdAt AS timestamp2         
         FROM
            [events_table]         
         WHERE
            eventType = "second_event") step2,       (SELECT
            eventUid AS sessionId3,
            createdAt AS timestamp3         
         FROM
            [events_table]         
         WHERE
            eventType = "third_Event") step3,         (SELECT sessionId1,
                    timestamp1,
                    IF(timestamp1 < timestamp2, timestamp2, NULL) AS timestamp2
             FROM
                  (SELECT sessionId1,
                          timestamp1,
                          timestamp2
                   FROM step1
                   LEFT JOIN step2
                   ON sessionId1 = sessionId2) ) steps1_2,  (SELECT sessionId1 as sessionId,
              timestamp1 as first_event_timestamp,
              timestamp2 as second_event_timestamp,
              IF(timestamp2 < timestamp3, timestamp3, NULL) as  third_event_timestamp
       FROM
            (SELECT sessionId2,
                    timestamp2,
                    timestamp3
             FROM steps1_2
             LEFT JOIN step3
             ON sessionId1 = sessionId3)
             ) steps1_2_3

The ideal result set would look something like the following: sessionId num_first_event num_second_event num_third_event S1 1 null null S2 2 3 null S3 4 5 6

My first question is whether it is possible to join in subqueries steps1_2, steps1_2_3 ?

Alternate approaches to achieving a events like work flow in bigquery, instead of counting the number of timestamps?

Any tip or suggested documentation is greatly appreciated Additionally, thank you for your time and consideration.

1 个答案:

答案 0 :(得分:0)

怎么样

SELECT
  sessionId,
  SUM(eventType = 'first-event') AS number_first_events,
  SUM(eventType = 'second-event') AS number_second_events,
  SUM(eventType = 'third-event') AS number_third_events
FROM [events_table]
GROUP BY sessionId