在句点之后提取文本作为特定订单项的分隔符

时间:2018-11-14 19:17:32

标签: sql regex google-bigquery

我正在尝试提取由句点分隔的文本。尝试了太长时间,希望有人能帮助我,我有点沮丧!

简而言之,以下字符串(单个字符串)是从列(例如,内容)中查询结果的示例。

示例字符串:

Some random text ........................... True
But really something ....................... Okay
Okay, just another test .................... 2010-04 is a good day

在此示例中,我试图在查询的SELECT部分​​中添加一些语句以将数据从Content中拉出。数据库中的所有行都具有相同的内容,只是具有不同的“值”(True,好的,2010年……)。

示例结果:

Col-Random     | Col2-Something  | Col3-Okay
---------------+-----------------+-------------------------
True           | Okay            | 2010-04 is a good day

我尝试了以下形式:

SELECT
regexp_extract(SUMMARY, r'/.*Some random text.*/g') as Col-Random
....
FROM `table`

1 个答案:

答案 0 :(得分:1)

  

...试图提取以句点分隔的文本

以下BigQuery标准SQL示例

#standardSQL
WITH `project.dataset.table` AS (
  SELECT 'Some random text ........................... True' line         UNION ALL
  SELECT 'But really something ....................... Okay'              UNION ALL
  SELECT 'Okay, just another test .................... 2010-04 is a good day' 
)
SELECT 
  SPLIT(line, REGEXP_EXTRACT(line, r'(\.{3}[\.]+)'))[SAFE_OFFSET(0)] key,
  SPLIT(line, REGEXP_EXTRACT(line, r'(\.{3}[\.]+)'))[SAFE_OFFSET(1)] value       
FROM `project.dataset.table`   

有结果

Row key                         value    
1   Some random text            True     
2   But really something        Okay     
3   Okay, just another test     2010-04 is a good day    

注意:以上假设至少有4个期间可以用作分隔符

因此,如果您将行设为Some ... random text ........................... True-它仍将被正确处理为

key                     value    
Some ... random text    True