正则表达式逗号分隔分隔符

时间:2017-12-05 03:43:23

标签: regex google-bigquery

我正在尝试用逗号分隔符拆分列。所以该列有多个值,如;的 139239338323 即可。由于某种原因,以下代码将适用于第一列,但其余列将为空。

SELECT  
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/){0}([^,\/]*),\/?') as Word0,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/){1}([^,\/]*),\/?') as Word1,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/){2}([^,\/]*),\/?') as Word2,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/){3}([^,\/]*),\/?') as Word3,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/){4}([^,\/]*),\/?') as Word4,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/){5}([^,\/]*),\/?') as Word5,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/){6}([^,\/]*),\/?') as Word6,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/){7}([^,\/]*),\/?') as Word7,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/){8}([^,\/]*),\/?') as Word8,
Regexp_extract(StringToParse,r'^(?:[^,\/]*,\/){9}([^,\/]*),\/?') as Word9
FROM
(SELECT event_list AS StringToParse FROM `mytable.2017`)

2 个答案:

答案 0 :(得分:1)

在下面尝试BigQuery Standard SQL

   
#standardSQL
SELECT 
  SPLIT(StringToParse)[SAFE_OFFSET (0)] AS Word0, 
  SPLIT(StringToParse)[SAFE_OFFSET (1)] AS Word1, 
  SPLIT(StringToParse)[SAFE_OFFSET (2)] AS Word2, 
  SPLIT(StringToParse)[SAFE_OFFSET (3)] AS Word3, 
  SPLIT(StringToParse)[SAFE_OFFSET (4)] AS Word4, 
  SPLIT(StringToParse)[SAFE_OFFSET (5)] AS Word5, 
  SPLIT(StringToParse)[SAFE_OFFSET (6)] AS Word6, 
  SPLIT(StringToParse)[SAFE_OFFSET (7)] AS Word7, 
  SPLIT(StringToParse)[SAFE_OFFSET (8)] AS Word8, 
  SPLIT(StringToParse)[SAFE_OFFSET (9)] AS Word9 
FROM 
  (SELECT event_list AS StringToParse FROM `mytable.2017`) 

您可以使用以下虚拟数据进行上述测试/播放

#standardSQL
WITH `mytable.2017` AS (
  SELECT '139,239,338,323' AS event_list UNION ALL
  SELECT '123,456,789,135'
)
SELECT 
  SPLIT(StringToParse)[SAFE_OFFSET (0)] AS Word0, 
  SPLIT(StringToParse)[SAFE_OFFSET (1)] AS Word1, 
  SPLIT(StringToParse)[SAFE_OFFSET (2)] AS Word2, 
  SPLIT(StringToParse)[SAFE_OFFSET (3)] AS Word3, 
  SPLIT(StringToParse)[SAFE_OFFSET (4)] AS Word4, 
  SPLIT(StringToParse)[SAFE_OFFSET (5)] AS Word5, 
  SPLIT(StringToParse)[SAFE_OFFSET (6)] AS Word6, 
  SPLIT(StringToParse)[SAFE_OFFSET (7)] AS Word7, 
  SPLIT(StringToParse)[SAFE_OFFSET (8)] AS Word8, 
  SPLIT(StringToParse)[SAFE_OFFSET (9)] AS Word9 
FROM 
  (SELECT event_list AS StringToParse FROM `mytable.2017`)   

同时,如果由于某种原因你必须在这个查询中使用正则表达式 - 请尝试下面的

#standardSQL
SELECT  
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(0)]  AS Word0,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(1)]  AS Word1,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(2)]  AS Word2,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(3)]  AS Word3,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(4)]  AS Word4,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(5)]  AS Word5,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(6)]  AS Word6,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(7)]  AS Word7,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(8)]  AS Word8,
  REGEXP_EXTRACT_ALL(StringToParse, r'([^,\/]*),\/?')[SAFE_OFFSET(9)]  AS Word9
FROM
  (SELECT event_list AS StringToParse FROM `mytable.2017`)  

当然,在上面的所有示例中,您可以通过为REGEXP_EXTRACT_ALL的SPLIT引入子查询来简化代码,然后只选择外部选择中的每个数组元素

答案 1 :(得分:1)

您只需使用SPLIT功能即可。例如,

SELECT
  parts[SAFE_OFFSET(0)] AS Word0,
  parts[SAFE_OFFSET(1)] AS Word1,
  parts[SAFE_OFFSET(2)] AS Word2,
  parts[SAFE_OFFSET(3)] AS Word3,
  parts[SAFE_OFFSET(4)] AS Word4,
  parts[SAFE_OFFSET(5)] AS Word5,
  parts[SAFE_OFFSET(6)] AS Word6,
  parts[SAFE_OFFSET(7)] AS Word7,
  parts[SAFE_OFFSET(8)] AS Word8,
  parts[SAFE_OFFSET(9)] AS Word9
FROM (
  SELECT SPLIT(event_list, ',') AS parts
  FROM `mytable.2017`
);