我试图使用正则表达式从标记化文本中删除标点符号。任何人都可以解释以下行为:
import re
preg = re.compile(r'^((?P<leading_currency>\S+) +)?'
r'(?P<value>[-\d\.,]+)'
r'( +(?P<trailing_currency>\S+))?$')
bunch = "$ 148.69\n" \
"€ 148.69\n" \
"€ 148,69\n" \
"148,69 €\n" \
"₹ 148.69\n" \
"Rs 148.69\n" \
"RM 148.69"
def parse_currency(line):
match = preg.match(line)
if match:
currency = match.group('leading_currency') \
or match.group('trailing_currency')
val_str = match.group('value')
dec_sep = '.' if val_str.rfind('.') > val_str.rfind(',') else ','
int_part, float_part = val_str.rsplit(dec_sep, 1)
def norm(number_string):
return ''.join(c for c in number_string if c.isdigit())
value = float('{}.{}'.format(norm(int_part), norm(float_part)))
return currency, value
for line in bunch.splitlines():
print(parse_currency(line))
$ STRING='hey , you ! what " are you doing ? say ... ," what '
$ echo $STRING | sed -r 's/ [^[:alnum:][:space:]-]+ / /g;'
hey you what are you doing say ," what
$ echo $STRING | sed -r 's/ [[:punct:]]+ / /g;'
hey you what are you doing say ," what
$ echo $STRING | perl -pe 's/ [^[:alnum:][:space:]-]+ / /g;'
hey you what are you doing say ," what
$ echo $STRING | perl -pe 's/ [[:punct:]]+ / /g;'
hey you what are you doing say ," what
令牌保留在输出中,我不想要。可以将此令牌与以下内容匹配:
,"