我正在使用Python进行自然语言处理。我正在尝试使用re
拆分输入字符串。我想使用;,.
以及单词but
进行拆分。
import re
print (re.split("[;,.]", 'i am; working here but you are. working here, as well'))
['i am', ' working here but you are', ' working here', ' as well']
怎么做?当我在正则表达式中加入单词but
时,它会将每个字符视为分裂标准。如何获得以下输出?
['i am', ' working here', 'you are', ' working here', ' as well']
答案 0 :(得分:5)
您可以按以下方式进行过滤:but | [;,.]
它会搜索字符;
,
和.
,还会搜索字词but
!
import re
print (re.split("but |[;,.]", 'i am; working here but you are. working here, as well'))
希望这有帮助。
答案 1 :(得分:0)
即使这个有效:
import re
print (re.split('; |, |\. | but', 'i am; working here but you are. working here, as well'))
输出:
['i am', 'working here', ' you are', 'working here', 'as well']