正则表达式从标记化文本中删除标点符号

时间:2015-06-28 12:10:51

标签: regex

我试图使用正则表达式从标记化文本中删除标点符号。任何人都可以解释以下行为:

import re

preg = re.compile(r'^((?P<leading_currency>\S+) +)?'
                  r'(?P<value>[-\d\.,]+)'
                  r'( +(?P<trailing_currency>\S+))?$')
bunch = "$ 148.69\n" \
        "€ 148.69\n" \
        "€ 148,69\n" \
        "148,69 €\n" \
        "₹ 148.69\n" \
        "Rs 148.69\n" \
        "RM 148.69"

def parse_currency(line):
    match = preg.match(line)
    if match:
        currency = match.group('leading_currency') \
                   or match.group('trailing_currency')
        val_str = match.group('value')
        dec_sep = '.' if val_str.rfind('.') > val_str.rfind(',') else ','
        int_part, float_part = val_str.rsplit(dec_sep, 1)
        def norm(number_string):
            return ''.join(c for c in number_string if c.isdigit())
        value = float('{}.{}'.format(norm(int_part), norm(float_part)))
        return currency, value

for line in bunch.splitlines():
    print(parse_currency(line))

$ STRING='hey , you ! what " are you doing ? say ... ," what ' $ echo $STRING | sed -r 's/ [^[:alnum:][:space:]-]+ / /g;' hey you what are you doing say ," what $ echo $STRING | sed -r 's/ [[:punct:]]+ / /g;' hey you what are you doing say ," what $ echo $STRING | perl -pe 's/ [^[:alnum:][:space:]-]+ / /g;' hey you what are you doing say ," what $ echo $STRING | perl -pe 's/ [[:punct:]]+ / /g;' hey you what are you doing say ," what 令牌保留在输出中,我不想要。可以将此令牌与以下内容匹配:

,"

0 个答案:

没有答案