BigQuery正则表达式

时间:2017-08-28 16:16:46

标签: regex google-bigquery

我有一张包含以下数据的表格。如果我们与" _"分开,试图提取第二个字段。它应该包含[数字 - 数字|数字 - 数字]。尝试使用regexp_extract,但无法获得所需的结果。

请建议如何实现这一目标。

数据:

                                             output 
D22_022-010|022-009_84233|669250    345     022-010 172.5
D22_022-010|022-009_666249|843250   22      022-009 172.5
D28I_28-04_5042|44182_250           235     022-010 11
D22_022-010|022-009_8423250         232     022-009 11
D23_23-06_NA_FW27_D23_600           22      28-04   235
D21_21-08_NA_FW14_D21_50            56      022-010 116
D23_23-06_NA_FW27_D23_90            88      022-009 116
D21_21-08_NA_FW14_D21_50            99      23-06   22
G | TR | Search : 56021             89      21-08   56
Free Sprayer_1x1(3.30)              77      23-06   88
Click Tracker (5.4)                 33      23-06   99
6.1 FW18_D28o_Click                 4       21-08   89
                                            null    77
                                            null    33
                                            null    4  

Table Data

1 个答案:

答案 0 :(得分:1)

以下是BigQuery Standard SQL

  

假设您的列位于advalue以下,则应按照您的要求进行操作

#standardSQL
SELECT item, ROUND(IFNULL(value / ARRAY_LENGTH(items), value)) AS split_value
FROM (
  SELECT value, 
    SPLIT(REGEXP_EXTRACT(ad, '_((?:[0-9]+-[0-9]+)(?:\\|(?:[0-9]+-[0-9]+))*)'),'|') AS items
  FROM `yourProject.yourDataset.yourTable`
) LEFT JOIN UNNEST(items) AS item   

您可以使用您问题中的以下虚拟数据进行测试

#standardSQL
WITH `yourTable` AS (
  SELECT 'D22_022-010|022-009_84233|669250' AS ad, 345 AS value UNION ALL
  SELECT 'D22_022-010|022-009_666249|843250', 22 UNION ALL
  SELECT 'D28I_28-04_5042|44182_250', 235 UNION ALL
  SELECT 'D22_022-010|022-009_8423250', 232 UNION ALL
  SELECT 'D23_23-06_NA_FW27_D23_600', 22 UNION ALL 
  SELECT 'D21_21-08_NA_FW14_D21_50', 56 UNION ALL 
  SELECT 'D23_23-06_NA_FW27_D23_90', 88 UNION ALL 
  SELECT 'D21_21-08_NA_FW14_D21_50', 99 UNION ALL 
  SELECT 'G | TR | Search : 56021', 89 UNION ALL 
  SELECT 'Free Sprayer_1x1(3.30)', 77 UNION ALL 
  SELECT 'Click Tracker (5.4)', 33 UNION ALL 
  SELECT '6.1 FW18_D28o_Click', 4 
)
SELECT item, ROUND(IFNULL(value / ARRAY_LENGTH(items), value)) AS split_value
FROM (
  SELECT value, 
    SPLIT(REGEXP_EXTRACT(ad, '_((?:[0-9]+-[0-9]+)(?:\\|(?:[0-9]+-[0-9]+))*)'),'|') AS items
  FROM `yourTable`
) LEFT JOIN UNNEST(items) AS item   

结果是(正如您所料)

item    split_value  
------- -----------
022-010       173.0  
022-009       173.0  
022-010        11.0  
022-009        11.0  
28-04         235.0  
022-010       116.0  
022-009       116.0  
23-06          22.0  
21-08          56.0  
23-06          88.0  
21-08          99.0  
null           89.0  
null           77.0  
null           33.0  
null            4.0