BigQuery:SPLIT()只返回一个值

时间:2014-11-21 11:32:42

标签: google-bigquery

我的页面网址列组件由/分隔。我试图在BigQuery中运行SPLIT()函数,但它只给出了第一个值。我想要特定列中的所有值。

我不明白如何使用Split string into multiple columns with bigquery中提到的Regexp_extract()示例。

我需要类似于REGEX_SPLIT_TO_TABLE(<String>, <DELIMITER>)的东西,它将单个字符串转换为多个列。

查询:

SELECT PK, 
DATE(TIMESTAMP(CONCAT(SUBSTR(date,1,4),'-',SUBSTR(date,5,2),'-',SUBSTR(date,7,2),' 00:00:00'))) as visit_date,
hits_page_pagePath,
split(hits_page_pagePath,'/')
FROM [Intent.All2mon] limit 100

5 个答案:

答案 0 :(得分:22)

2018 standardSQL update:

#standardSQL
SELECT SPLIT(path, '/')[OFFSET(0)] part1,
       SPLIT(path, '/')[OFFSET(1)] part2,
       SPLIT(path, '/')[OFFSET(2)] part3
FROM (SELECT "/a/b/aaaa?c" path)

现在我明白你想要它们在不同的专栏中。

您提供的查询的替代方法:

SELECT FIRST(SPLIT(path, '/')) part1,
       NTH(2, SPLIT(path, '/')) part2,
       NTH(3, SPLIT(path, '/')) part3
FROM (SELECT "/a/b/aaaa?c" path)

NTH(X, SPLIT(s))将提供SPLIT的Xth值。 FIRST(s)NTH(1, s)

相同

答案 1 :(得分:1)

在标准sql中,您可以使用以下函数:

array[OFFSET(zero_based_offset)]
array[ORDINAL(one_based_ordinal)]

所以

SELECT SPLIT(path, '/')[OFFSET(1)] part2,
       SPLIT(path, '/')[ORDINAL(2)] part2_again,
       SPLIT(path, '/')[ORDINAL(3)] part3
FROM (SELECT "/a/b/aaaa?c" path)

part2   part2_again part3    
a       a           b
在这种情况下,

part1是空字符串(在第一个斜杠之前)

答案 2 :(得分:0)

这对我有用:

SELECT SPLIT(path, '/') part
FROM (SELECT "/a/b/aaaa?c" path)

Row part     
1   a    
2   b    
3   aaaa?c

不确定为什么它不适合你。你的数据是什么样的?

答案 3 :(得分:0)

以某种方式解决了它。

   SELECT
   date, 
   hits_time, 
   fullVisitorId, 
   visitNumber, 
   hits_hitNumber,
   X.page_path,
   REGEXP_EXTRACT(X.page_path,r'/(\w*)\/') as one,
   REGEXP_EXTRACT(X.page_path,r'/\w*\/(\w*)') as two,
   REGEXP_EXTRACT(X.page_path,r'/\w*\/\w*\/(\w*)') as three,
   REGEXP_EXTRACT(X.page_path,r'/\w*/\w*/\w*\/(\w*)\/.*') as four
   from
   (
   select 
   date, hits_time, fullVisitorId, visitNumber, hits_hitNumber,
   REGEXP_REPLACE (hits_page_pagePath, '-', '') as page_path
   from
   [Intent.All2mon]
   ) X 
   limit 1000

答案 4 :(得分:0)

您也可以尝试使用SPLIT函数进行以下操作,但是您需要知道您的网址有多少个'/'或进行足够的输入,以便如果您的网址包含更多的'/',那么您仍然可以在单独的列

  SPLIT(`url`, '/')[safe_ordinal(1)] AS `Col1`, 
  SPLIT(`url`, '/')[safe_ordinal(2)] AS `Col2`,
  SPLIT(`url`, '/')[safe_ordinal(3)] AS `Col3`, 
  SPLIT(`url`, '/')[safe_ordinal(4)] AS `Col4`,
  .
  .
  SPLIT(`url`, '/')[safe_ordinal(N)] AS `ColN`