I am attempting to return a query result from bigquery that returns the number of events that occur by session. I have been referring to the following article:
http://developer.streak.com/2013/11/using-google-bigquery-for-event-tracking.html
The database schema is pretty simple [sessionId, eventType, createdAt] The returning result set would be similar to an event workflow in google analytics. Something like [sessionId, num_event1, num_event2, ...]
The approach is to generate sub queries by event type and timestamp and then create additional subqueries that join the results from each of the event subqueries. I am able to do Step1, step2, step3 subqueries in isolation:
SELECT COUNT(first_event_timestamp) AS number_first_events,
COUNT(second_event_timestamp) AS number_second_events,
COUNT(third_event_timestamp) AS number_third_events
FROM
(SELECT eventUid AS eventUid1,
createdAt AS timestamp1
FROM [events_table]
WHERE eventType = 'first-event') step1,
(SELECT eventUid AS eventUid2,
createdAt AS timestamp2
FROM [events_table]
WHERE eventType = 'second-event') step2,
(SELECT
eventUid as sessionId3,
createdAt as timestamp3
FROM
[events_table]
WHERE
eventType = "third_event") step3
Adding steps1_2, steps1_2_3 is where i am hitting a wall. I get an error that the dataset name is missing from the table. Here is the full query:
SELECT COUNT(first_event_timestamp) AS num_first,
COUNT(second_event_timestamp) AS num_second,
COUNT(third_event_timestamp) AS num_third
FROM (SELECT
sessionId
first_event_timestamp,
second_event_timestamp,
third_event_timestamp
FROM steps1_2_3
GROUP BY sessionId),
(SELECT
sessionId AS sessionId1,
createdAt AS timestamp1
FROM
[events_table]
WHERE
eventType = "first_event") step1, (SELECT
eventUid AS sessionId2,
createdAt AS timestamp2
FROM
[events_table]
WHERE
eventType = "second_event") step2, (SELECT
eventUid AS sessionId3,
createdAt AS timestamp3
FROM
[events_table]
WHERE
eventType = "third_Event") step3, (SELECT sessionId1,
timestamp1,
IF(timestamp1 < timestamp2, timestamp2, NULL) AS timestamp2
FROM
(SELECT sessionId1,
timestamp1,
timestamp2
FROM step1
LEFT JOIN step2
ON sessionId1 = sessionId2) ) steps1_2, (SELECT sessionId1 as sessionId,
timestamp1 as first_event_timestamp,
timestamp2 as second_event_timestamp,
IF(timestamp2 < timestamp3, timestamp3, NULL) as third_event_timestamp
FROM
(SELECT sessionId2,
timestamp2,
timestamp3
FROM steps1_2
LEFT JOIN step3
ON sessionId1 = sessionId3)
) steps1_2_3
The ideal result set would look something like the following: sessionId num_first_event num_second_event num_third_event S1 1 null null S2 2 3 null S3 4 5 6
My first question is whether it is possible to join in subqueries steps1_2, steps1_2_3 ?
Alternate approaches to achieving a events like work flow in bigquery, instead of counting the number of timestamps?
Any tip or suggested documentation is greatly appreciated Additionally, thank you for your time and consideration.
答案 0 :(得分:0)
怎么样
SELECT
sessionId,
SUM(eventType = 'first-event') AS number_first_events,
SUM(eventType = 'second-event') AS number_second_events,
SUM(eventType = 'third-event') AS number_third_events
FROM [events_table]
GROUP BY sessionId