Question

我在Redshift中遇到了一个较为复杂的正则表达式问题。我希望能够提取所有带有百分号的数字值，而不提取没有百分号的数字。我当前的脚本适用于更简单的示例，但不适用于更复杂的示例。

我有一个专栏介绍饮料成分。

一个简单的示例可能是"95% Apple, 5% Grape"或50.25% grape, 49.75% apple。我可以用 '[0-9]+(\.[0-9][0-9]?%)?'。但是，更复杂的示例（例如"50% Apple, 50% Grape, 2mg grape juice"或"100% Juice,50% Apple, 50% Grape"导致我分别提取"2" and 100, 50, and 50,。

[0-9]+(\.[0-9][0-9]?%)?

我已经能够抓取每个数字，但是我只想抓紧紧随其后的带有百分号的数字，其中“ 100％”的值与其他百分值不一样。因此，使用此示例"100% Juice,50% Apple, 50% Grape"，我只想要两个50％的值。编辑：我也想澄清一下我使用的是regexp_substr函数，因此两个50％的值将通过使用索引位于其自己的列中。

Answer 1

您可以使用以下内容

\b\d?\d%\s
https://regex101.com/r/wxGfaX/1

Answer 2

这是使用python udf的答案

create or replace function process_percentages(InputStr varChar)
  returns varchar
stable
as $$
    import re
    OutputStr = ''
    pattern=re.compile('(\d+(\.\d+)?%)')
    if ('100%' in InputStr) & (InputStr.count('%') == 1):
        OutputStr='100%,'
    else:
        for m in re.finditer(pattern, InputStr):
            if float(m.group(1)[:-1]) < 100.0:
                OutputStr+=m.group(1) + ','
    return OutputStr[:-1]
$$ language plpythonu;

然后您可以像使用它一样

Select process_percentages('10% Apple, 10% 5% Grape');
Select process_percentages('100% 10% Apple, 10% 5% Grape');
Select process_percentages('123% nothing 10% Apple, 10% Grape');
Select process_percentages('100% Apple, Grape');
Select process_percentages('10.56% Apple, 5.22% Grape');

这总是可行的，如果您的需求变得更加复杂，可以轻松地对其进行定制。

您确实需要先遵循https://docs.aws.amazon.com/redshift/latest/dg/udf-security-and-privileges.html才能获得权限

仅匹配Redshift Regex中的百分比值

2 个答案: