Question

我想在大查询数据表中拆分字段。

我给你样例..

==案例1 == Source Filed =“idx1-cnt1-name1，idx2-cnt2-name2 ...相同模式”

Result table
idx | cnt | name |
idx1,cnt1,name1
idx2,cnt2,name2,....

查询中的

：

select                      
    regexp_extract(split_col, r'([\d]*)-') as ItemIdx,  
    regexp_extract(split_col, r'-([\d]*)-') as Cnt,
    regexp_extract(split_col, r'-([\d]*)$') as TitleIdx
From (
Select pid,now, split(source field, ',') split_col from (       
SELECT * FROM table ))

但我不能在这种情况下制作代码

这种情况下字符串有多个saparator。

==案例2 ==

Source String =“item1-name1-type1-value1，.... same pattern”

Result Table
name | type
name1, type1
name2,type2

字段数不同。

但我只需要第二个，第三个字段值。

如何进行查询..

Answer 1

我看到你正在使用BigQuery遗留sql - 所以下面的示例（注意 - 建议尽可能使用BigQuery Standard SQL - 所以考虑迁移） - 下面简化了以使逻辑更易于阅读，所以你可以轻松地将其扩展到可能更类似的案例

案例1 /示例

#legacySQL
SELECT                      
    REGEXP_EXTRACT(split_col, r'^(.*?)-.*?-.*?$') AS idx,  
    REGEXP_EXTRACT(split_col, r'^.*?-(.*?)-.*?$') AS cnt,
    REGEXP_EXTRACT(split_col, r'^.*?-.*?-(.*?$)') AS name
FROM (
  SELECT SPLIT(source_field, ',') split_col 
  FROM (SELECT "idx1-cnt1-name1,idx2-cnt2-name2" source_field)
)

结果：

Row idx     cnt     name     
1   idx1    cnt1    name1    
2   idx2    cnt2    name2

案例2 /示例

#legacySQL
SELECT                      
    REGEXP_EXTRACT(split_col, r'^.*?-(.*?)-.*?') AS name,  
    REGEXP_EXTRACT(split_col, r'^.*?-.*?-(.*?)-') AS type
FROM (
  SELECT SPLIT(source_string, ',') split_col 
  FROM (SELECT "item1-name1-type1-value1, item2-name2-type2-value2" source_string)
)

结果：

Row name    type     
1   name1   type1    
2   name2   type2

下面是相同的示例，但对于BigQuery Standard SQL（只是案例2，因为它们非常相似）

#standardSQL
WITH `project.dataset.table` AS (
  SELECT "item1-name1-type1-value1, item2-name2-type2-value2" source_string
)
SELECT 
  REGEXP_EXTRACT(split_col, r'^.*?-(.*?)-.*?') AS name,  
  REGEXP_EXTRACT(split_col, r'^.*?-.*?-(.*?)-') AS type
FROM `project.dataset.table`, UNNEST(SPLIT(source_string, ',')) split_col

显然 - 结果相同

Row name    type     
1   name1   type1    
2   name2   type2

另一种选择是 -

#standardSQL
WITH `project.dataset.table` AS (
  SELECT "item1-name1-type1-value1, item2-name2-type2-value2" source_string
)
SELECT 
  SPLIT(split_col, '-')[SAFE_OFFSET(1)] AS name,  
  SPLIT(split_col, '-')[SAFE_OFFSET(2)] AS type
FROM `project.dataset.table`, UNNEST(SPLIT(source_string, ',')) split_col

依旧......

如何在bigquery中使用正则表达式拆分字符串

1 个答案: