如何使用单个日期字段或更有效地编写此字段

时间:2019-06-18 21:05:40

标签: sql google-bigquery google-data-studio

我们编写了一个查询,该查询计算了将电子邮件从Gmail路由到第三方安全服务,然后再返回Gmail所花费的时间。现在我们要在DataStudio中对其进行图形化,但是其编写方式需要在两个位置指定日期,以减少要查询的分区数量,但是大多数系统(如DataStudio)只能在一个时间范围内使用单个字段。我还能怎么写,以便将单个字段用于时间范围?

SELECT
datetime_diff(timestamp_after, timestamp_before, SECOND) as delay,
timestamp_before,
timestamp_after,
sender_before as sender,
recipient_before as recipient,
message_id_before as message_id,
subject_before as subject,
spf_pass_before,
spf_pass_after,
pt_before,
pt_after
FROM(
  SELECT
    _TABLE_SUFFIX as pt_after,
    DATETIME(timestamp_micros(event_info. timestamp_usec), "America/New_York") as timestamp_after,
    message_info.rfc2822_message_id as message_id_after,
    message_info.connection_info.spf_pass as spf_pass_after,
    message_info.source.address as sender_after,
    message_info.subject as subject_after,
    dest.address as recipient_after,
    rule.rule_name as rule_name_after
  FROM
    `g-suite-logs.gmail_logs.daily_*`,
    UNNEST ( message_info.destination ) as dest,
    UNNEST ( message_info.triggered_rule_info ) as rule
   WHERE rule.rule_name = "AFTER RETURNING FROM THIRD PARTY SYSTEM"
  GROUP BY
    pt_after,
    message_id_after,
    timestamp_after,
    spf_pass_after,
    sender_after,
    recipient_after,
    rule_name_after,
    subject_after
  ) rule_after
JOIN(
  SELECT
    _TABLE_SUFFIX as pt_before,
    DATETIME(timestamp_micros(event_info. timestamp_usec), "America/New_York") as timestamp_before,
    message_info.rfc2822_message_id as message_id_before,
    message_info.connection_info.spf_pass as spf_pass_before,
    message_info.source.address as sender_before,
    message_info.subject as subject_before,
    dest.address as recipient_before,
    rule.rule_name as rule_name_before
  FROM
    `g-suite-logs.gmail_logs.daily_*`,
    UNNEST ( message_info.destination ) as dest,
    UNNEST ( message_info.triggered_rule_info ) as rule
  WHERE rule.rule_name = "BEFORE ROUTING TO THIRD PARTY SYSTEM"
  GROUP BY
    pt_before, 
    message_id_before,
    timestamp_before,
    spf_pass_before,
    sender_before,
    recipient_before,
    rule_name_before,
    subject_before
  ) rule_before
ON
  rule_before.message_id_before = rule_after.message_id_after AND recipient_before = recipient_after

我可以将其保存为视图并设置'WHERE pt_before =“ 20190618” AND pt_after =“ 20190618”',它可以显着降低查询成本(从1.5tb减少到24gb),但是我不能将视图轻松插入DataStudio中,因为必须使用两个日期字段。

一种选择是使用参数化查询,但我认为DataStudio不支持这些查询。

在一个单独的主题上,this是我最初编写查询的方式,这似乎更有效,但是,我发现该查询与一个规则匹配但与另一条规则不匹配时,出现了很多误报。记录timediff为0,使结果倾斜。因此,如果有人对更有效的编写方法有建议,我愿意接受建议。

1 个答案:

答案 0 :(得分:0)

也许您可以使用类似的方法

with

gmail_logs as (
    select distinct
        _table_suffix as pt,
        datetime(timestamp_micros(event_info.timestamp_usec), "America/New_York") as timestamp,
        message_info.rfc2822_message_id as message_id,
        message_info.connection_info.spf_pass,
        message_info.source.address as sender,
        message_info.subject,
        dest.address as recipient,
        rule.rule_name
    from
        `g-suite-logs.gmail_logs.daily_*` as gl
        cross join unnest(gl.message_info.destination) as dest
        cross join unnest(gl.message_info.triggered_rule_info) as rule
    where
        rule.rule_name in ( 'AFTER RETURNING FROM THIRD PARTY SYSTEM',
                            'BEFORE ROUTING TO THIRD PARTY SYSTEM')
        and _table_suffix = '20190618'
)

select
    message_id,
    recipient,

    datetime_diff(
        max(if( rule_name = 'AFTER RETURNING FROM THIRD PARTY SYSTEM',
                timestamp, null)),
        max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
                timestamp, null)),
        second) as delay,

    max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
            timestamp, null)) as timestamp_before,
    max(if( rule_name = 'AFTER RETURNING FROM THIRD PARTY SYSTEM',
            timestamp, null)) as timestamp_after,

    max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
            sender, null)) as sender,
    max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
            subject, null)) as subject,

    max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
            spf_pass, null)) as spf_pass_before,
    max(if( rule_name = 'AFTER RETURNING FROM THIRD PARTY SYSTEM',
            spf_pass, null)) as spf_pass_after,

    max(if( rule_name = 'BEFORE ROUTING TO THIRD PARTY SYSTEM',
            pt, null)) as pt_before,
    max(if( rule_name = 'AFTER RETURNING FROM THIRD PARTY SYSTEM',
            pt, null)) as pt_after
from
    gmail_logs
group by
    1, 2