BigQuery如何使用特殊字符'|创建查询正则表达式?在标准SQL中

时间:2018-09-11 15:16:46

标签: google-bigquery

使用BigQuery,希望您能在正则表达式中使用特殊字符'|创建此查询。 '或等效的标准sql? 这个想法是使用正则表达式代替该字段的多个字段(“ login”,“ Unknown”,“ registration”,“ login”,“ start”,“ null”) hit.eventInfo.eventCategory

#standardSQL
SELECT
visitNumber,
visitStartTime,
date,
totals.visits,
totals.hits,
totals.pageviews,
totals.timeOnSite,
hit.hitNumber,
hit.page.pagePath,
hit.page.hostname,
hit.page.pageTitle,
hit.eventInfo.eventCategory,
hit.eventInfo.eventAction,
hit.eventInfo.eventLabel,
cd.index,
cd.value
FROM
[bqdatasetnumber.ga_sessions_*],
WHERE
_TABLE_SUFFIX BETWEEN '20180905'
AND '20180911'
AND customDimensions.value != "null"
AND hit.eventInfo.eventCategory != "login"
AND hit.eventInfo.eventCategory != "null"
AND hit.eventInfo.eventCategory != "Unknown"
AND hit.eventInfo.eventCategory != "registration"
AND hit.eventInfo.eventAction != "start" 

感谢您的帮助和提示

塞巴斯蒂安

2 个答案:

答案 0 :(得分:1)

您可以尝试:

AND REGEXP_CONTAINS(hit.eventInfo.eventCategory, r"^Unknown|registration|login|start$") != true

您可以找到REGEXP_CONTAINS函数here的文档。

  • “或”操作数为“ |”

  • ^和$,表示该值应与表达式完全匹配。

  • 空值也将被省略。

答案 1 :(得分:1)

您可以使用

AND NOT LOWER(hit.eventInfo.eventCategory) in ("login", "unknown", "registration", "login", "start", "null")    

AND NOT REGEXP_CONTAINS(hit.eventInfo.eventCategory, r"(?i)^(login|unknown|registration|login|start|null)$")

您所查询的查询似乎已被截断,因此显然仅在上方添加将不起作用,因此我想您需要类似

的内容
#standardSQL
SELECT
  visitNumber,
  visitStartTime,
  DATE,
  totals.visits,
  totals.hits,
  totals.pageviews,
  totals.timeOnSite,
  hit.hitNumber,
  hit.page.pagePath,
  hit.page.hostname,
  hit.page.pageTitle,
  hit.eventInfo.eventCategory,
  hit.eventInfo.eventAction,
  hit.eventInfo.eventLabel,
  cd.index,
  cd.value
FROM `bqdatasetnumber.ga_sessions_*` a,
UNNEST(hits) hit,
UNNEST(a.customDimensions) cd 
WHERE _TABLE_SUFFIX BETWEEN '20130905' AND '20130911'
AND cd.value != "null"
AND NOT REGEXP_CONTAINS(hit.eventInfo.eventCategory, r"(?i)^(login|unknown|registration|login|start|null)$")
AND hit.eventInfo.eventAction != "start"    

不确定cd的来源,因此猜测它是在根目录而不是在hits中引用customDimensions的。但这可能很受欢迎:o)