Question

我有一个具有多个匹配组的正则表达式。

如何在雪花中指定要返回的匹配组？

我正在使用REGEXP_SUBSTR，但很乐意使用其他替代方法，

Answer 1

TL; DR：不能完全做到，但是您可以使用'e'选项，并与(?:re)一起使用非捕获组。

因此需要澄清的是，似乎尼尔要求的东西会返回word

select regexp_substr('bird is the word','(bird) (is) (the) (word)',1,4)

不幸的是，我认为Snowflake今天不完全支持此功能。 REGEXP_SUBSTR有一个'e'（提取）参数，该参数仅允许您提取组，但始终提取 first 组。原因是今天的occurrence参数表示字符串中整个正则表达式的出现。例子

select regexp_substr('bird is cows are','([a-z]*) (is|are)',1,2,'e');
=> cows

例如，您可以通过不对分组进行分组来实现所需的目标，例如

select regexp_substr('bird is the word','bird (is) (the) (word)',1,1,'e');
-> is
select regexp_substr('bird is the word','bird is the (word)',1,1,'e');
-> word

但是，如果您想使用分组来表示替代方案，则该方法无效。

select regexp_substr('cow is the word','(bird|cow) is the (word)',1,1,'e');
-> cow

然后，您可以使用(?:re)语法对正则表达式使用 非捕获分组 语法

select regexp_substr('cow is the word','(?:bird|cow) is the (word)',1,1,'e');
-> word
select regexp_substr('cow is the word','(?:bird|cow) (?:is) (?:the) (word)',1,1,'e');
-> word

不过，我仍然认为提供提取特定组号的选项会很有价值，它将随着Snowflake开发而提高：）

Answer 2

在How to get the output from .jar execution in python codes?中有一个名为with open('collect.json','w') as fp : subprocess.Popen('java -jar .//home/myfolder/collect.jar',stdout=fp).wait()的参数，可让您指定要返回的匹配项。

例如：

occurance

如何只返回一个正则表达式匹配组？

2 个答案: