print(nltk.regexp_tokenize('That U.S.A. poster-print costs $12.40...13.10', r"((?:(?:[A-Z]\.)+)|(?:\w+(?:-\w+)*)|(?:\d+(?:\.\d+)?))"))
输出:
"['That', 'U.S.A.', 'poster-print', 'costs', '12', '40', '13', '10']"
并且(按括号中的模式顺序改变):
print(nltk.regexp_tokenize('That U.S.A. poster-print costs $12.40...13.10', r"((?:(?:[A-Z]\.)+)|(?:\d+(?:\.\d+)?)|(?:\w+(?:-\w+)*))"))
输出:
['That', 'U.S.A.', 'poster-print', 'costs', '12.40', '13.10']
为什么这种情况下的订单很重要?