使用re分割基于分隔符和单词的字符串

时间:2018-02-15 10:08:23

标签: python regex

我正在使用Python进行自然语言处理。我正在尝试使用re拆分输入字符串。我想使用;,.以及单词but进行拆分。

import re
print (re.split("[;,.]", 'i am; working here but you are. working here, as well'))

['i am', ' working here but you are', ' working here', ' as well']

怎么做?当我在正则表达式中加入单词but时,它会将每个字符视为分裂标准。如何获得以下输出?

['i am', ' working here', 'you are', ' working here', ' as well']

2 个答案:

答案 0 :(得分:5)

您可以按以下方式进行过滤:but | [;,.]
它会搜索字符; ,.,还会搜索字词but

import re
print (re.split("but |[;,.]", 'i am; working here but you are. working here, as well'))
希望这有帮助。

答案 1 :(得分:0)

即使这个有效:

import re
print (re.split('; |, |\. | but', 'i am; working here but you are. working here, as well'))

输出:

['i am', 'working here', ' you are', 'working here', 'as well']