Google Big Query中的第二个子字符串

时间:2016-04-21 14:24:48

标签: sql google-bigquery

我正在尝试使用Google Big Query找到字符串中子字符串第二次出现的索引。

例如,在字符串' challcha'中,第二次出现' ch'将在第6位。

据我所知,这可以通过Oracle中的CharIndex实现。我想在Google Big Query中实现这一目标。

任何帮助表示赞赏!!

1 个答案:

答案 0 :(得分:3)

  

对于具有纯SQL String functions

的BigQuery
SELECT test, 
  INSTR(test, 'ch') + 1 + INSTR(SUBSTR(test, INSTR(test, 'ch') + 2), 'ch') AS pos,
FROM 
  (SELECT 'challcha' AS test),
  (SELECT 'chcha' AS test),
  (SELECT 'chha' AS test)
WHERE 
  INSTR(SUBSTR(test, INSTR(test, 'ch') + 2), 'ch') > 0

注意:INSTR区分大小写,因此如果您遇到混合情况,您可能希望将所有内容置于LOWER或UPPER

  

BigQuery User-Defined Functions

SELECT test, pos FROM JS(
(
  SELECT test FROM 
    (SELECT 'challcha' AS test),
    (SELECT 'chcha' AS test),
    (SELECT 'chha' AS test)
) ,
test,
"[{name: 'test', type:'string'},
  {name: 'pos', type:'integer'}
  ]
",
"function(r, emit) {
  var search = 'ch';
  var pos1 = r.test.indexOf(search) + 1;
  var pos2 = r.test.indexOf(search, pos1) + 1;
  if (pos1 * pos2 == 0) pos2 = 0
  emit({test: r.test, pos: pos2});
}"
)
  

使用纯BigQuery Regular expression functions

SELECT test, 
  LENGTH(REGEXP_EXTRACT(test, r'(?i)(.*?)ch')) + 3 + 
    LENGTH(REGEXP_EXTRACT(REGEXP_EXTRACT(test, r'(?i)ch(.*)'), r'(?i)(.*?)ch')) AS len,
FROM 
  (SELECT 'ChallCha' AS test),
  (SELECT 'abChallCha' AS test),
  (SELECT 'chcha' AS test),
  (SELECT 'chha' AS test)