我们编写了一个查询,该查询计算了将电子邮件从Gmail路由到第三方安全服务,然后再返回Gmail所花费的时间。现在我们要在DataStudio中对其进行图形化,但是其编写方式需要在两个位置指定日期,以减少要查询的分区数量,但是大多数系统(如DataStudio)只能在一个时间范围内使用单个字段。我还能怎么写,以便将单个字段用于时间范围?
SELECT
datetime_diff(timestamp_after, timestamp_before, SECOND) as delay,
timestamp_before,
timestamp_after,
sender_before as sender,
recipient_before as recipient,
message_id_before as message_id,
subject_before as subject,
spf_pass_before,
spf_pass_after,
pt_before,
pt_after
FROM(
SELECT
_TABLE_SUFFIX as pt_after,
DATETIME(timestamp_micros(event_info. timestamp_usec), "America/New_York") as timestamp_after,
message_info.rfc2822_message_id as message_id_after,
message_info.connection_info.spf_pass as spf_pass_after,
message_info.source.address as sender_after,
message_info.subject as subject_after,
dest.address as recipient_after,
rule.rule_name as rule_name_after
FROM
`g-suite-logs.gmail_logs.daily_*`,
UNNEST ( message_info.destination ) as dest,
UNNEST ( message_info.triggered_rule_info ) as rule
WHERE rule.rule_name = "AFTER RETURNING FROM THIRD PARTY SYSTEM"
GROUP BY
pt_after,
message_id_after,
timestamp_after,
spf_pass_after,
sender_after,
recipient_after,
rule_name_after,
subject_after
) rule_after
JOIN(
SELECT
_TABLE_SUFFIX as pt_before,
DATETIME(timestamp_micros(event_info. timestamp_usec), "America/New_York") as timestamp_before,
message_info.rfc2822_message_id as message_id_before,
message_info.connection_info.spf_pass as spf_pass_before,
message_info.source.address as sender_before,
message_info.subject as subject_before,
dest.address as recipient_before,
rule.rule_name as rule_name_before
FROM
`g-suite-logs.gmail_logs.daily_*`,
UNNEST ( message_info.destination ) as dest,
UNNEST ( message_info.triggered_rule_info ) as rule
WHERE rule.rule_name = "BEFORE ROUTING TO THIRD PARTY SYSTEM"
GROUP BY
pt_before,
message_id_before,
timestamp_before,
spf_pass_before,
sender_before,
recipient_before,
rule_name_before,
subject_before
) rule_before
ON
rule_before.message_id_before = rule_after.message_id_after AND recipient_before = recipient_after
我可以将其保存为视图并设置'WHERE pt_before =“ 20190618” AND pt_after =“ 20190618”',它可以显着降低查询成本(从1.5tb减少到24gb),但是我不能将视图轻松插入DataStudio中,因为必须使用两个日期字段。
一种选择是使用参数化查询,但我认为DataStudio不支持这些查询。
在一个单独的主题上,this是我最初编写查询的方式,这似乎更有效,但是,我发现该查询与一个规则匹配但与另一条规则不匹配时,出现了很多误报。记录timediff为0,使结果倾斜。因此,如果有人对更有效的编写方法有建议,我愿意接受建议。
答案 0 :(得分:0)
也许您可以使用类似的方法
with
gmail_logs as (
select distinct
_table_suffix as pt,
datetime(timestamp_micros(event_info.timestamp_usec), "America/New_York") as timestamp,
message_info.rfc2822_message_id as message_id,
message_info.connection_info.spf_pass,
message_info.source.address as sender,
message_info.subject,
dest.address as recipient,
rule.rule_name
from
`g-suite-logs.gmail_logs.daily_*` as gl
cross join unnest(gl.message_info.destination) as dest
cross join unnest(gl.message_info.triggered_rule_info) as rule
where
rule.rule_name in ( 'AFTER RETURNING FROM THIRD PARTY SYSTEM',
'BEFORE ROUTING TO THIRD PARTY SYSTEM')
and _table_suffix = '20190618'
)
select
message_id,
recipient,
datetime_diff(
max(if( rule_name = 'AFTER RETURNING FROM THIRD PARTY SYSTEM',
timestamp, null)),
max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
timestamp, null)),
second) as delay,
max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
timestamp, null)) as timestamp_before,
max(if( rule_name = 'AFTER RETURNING FROM THIRD PARTY SYSTEM',
timestamp, null)) as timestamp_after,
max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
sender, null)) as sender,
max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
subject, null)) as subject,
max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
spf_pass, null)) as spf_pass_before,
max(if( rule_name = 'AFTER RETURNING FROM THIRD PARTY SYSTEM',
spf_pass, null)) as spf_pass_after,
max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
pt, null)) as pt_before,
max(if( rule_name = 'AFTER RETURNING FROM THIRD PARTY SYSTEM',
pt, null)) as pt_after
from
gmail_logs
group by
1, 2