使用正则表达式排除值-BigQuery

时间:2018-07-03 18:43:04

标签: sql google-bigquery standard-sql

如果我想从聚合中排除字段中的正则表达式匹配项,我将在标准SQL中使用什么。代码运行;但是,它计算正则表达式中的值,而我希望它排除在外。我有以下代码:

SELECT
    channelGrouping, 
    date,
    SUM(totals.timeOnSite) AS Session_Duration,
    SUM(totals.visits) AS Visits,
    AVG(totals.timeonSite/totals.visits) AS Avg_Time_per_Session,
    SUM(totals.bounces) AS Bounce,
    (SUM(totals.bounces)/SUM(totals.visits)) AS Bounce_rate
FROM
    `93868086.ga_sessions_*`
WHERE
    _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY))
AND
    FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
GROUP BY 
    date,
    channelGrouping,
    geoNetwork.networkLocation 
HAVING
    REGEXP_CONTAINS(geoNetwork.networkLocation,
    r"^(ovh \(nwk\)|hostwinds llc.|bhost inc|prisma networks llc|psychz networks|buyvm services|private customer|secure dragon llc.|vmpanel|netaction telecom srl-d|hostigation|frontlayer technologies inc.|digital energy technologies limited|owned-networks|rica web services|netaction telecom srl-d|hurricane electric inc.|private customer - host.howpick.com|ssdvirt|sway broadband|detect network|gorillaservers inc.|micfo llc.| netaction telecom srl|egihosting|zenlayer inc|intercom online inc.|gs1 argentine|ovh hosting inc.|vps cheap inc.|limeip networks|blackhost ltd.|amazon.com inc.)$")
ORDER BY
    date ASC

1 个答案:

答案 0 :(得分:2)

根据您的实际需求(尚不能100%明确地解决问题),您应该将NOT添加到您的HAVING语句中

HAVING NOT REGEXP_CONTAINS ...   

或将排除逻辑移动到WHERE子句

SELECT
    channelGrouping, 
    date,
    SUM(totals.timeOnSite) AS Session_Duration,
    SUM(totals.visits) AS Visits,
    AVG(totals.timeonSite/totals.visits) AS Avg_Time_per_Session,
    SUM(totals.bounces) AS Bounce,
    (SUM(totals.bounces)/SUM(totals.visits)) AS Bounce_rate
FROM
    `93868086.ga_sessions_*`
WHERE
    _TABLE_SUFFIX BETWEEN FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 365 DAY))
AND
    FORMAT_DATE('%Y%m%d',DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY))
AND NOT
    REGEXP_CONTAINS(geoNetwork.networkLocation,
    r"^(ovh \(nwk\)|hostwinds llc.|bhost inc|prisma networks llc|psychz networks|buyvm services|private customer|secure dragon llc.|vmpanel|netaction telecom srl-d|hostigation|frontlayer technologies inc.|digital energy technologies limited|owned-networks|rica web services|netaction telecom srl-d|hurricane electric inc.|private customer - host.howpick.com|ssdvirt|sway broadband|detect network|gorillaservers inc.|micfo llc.| netaction telecom srl|egihosting|zenlayer inc|intercom online inc.|gs1 argentine|ovh hosting inc.|vps cheap inc.|limeip networks|blackhost ltd.|amazon.com inc.)$")
GROUP BY 
    date,
    channelGrouping,
    geoNetwork.networkLocation 
ORDER BY
    date ASC

还要注意:您在最终的SELECT语句中缺少networkLocation。不是致命的,但将其包含在其中更有意义