我想使用python re.split()通过逗号将句子分成多个字符串,但我不想申请用逗号分隔的单个单词,例如:
实施例
s = "Yes, alcohol can have a place in a healthy diet."
desired result = ["Yes, alcohol can have a place in a healthy diet."]
另一个例子:
s = "But, of course, excess alcohol is terribly harmful to health in a variety of ways, and even moderatealcohol intake is associated with an increase in the number two cause of premature death: cancer."
desired output = ["But, of course" , "excess alcohol is terribly harmful to health in a variety of ways" , "and even moderatealcohol intake is associated with an increase in the number two cause of premature death: cancer."]
任何指针?请。
答案 0 :(得分:1)
由于Python在正则表达式中不支持可变长度lookbehind assertions,我会改为使用re.findall()
:
In [3]: re.findall(r"\s*((?:\w+,)?[^,]+)",s)
Out[3]:
['But, of course',
'excess alcohol is terribly harmful to health in a variety of ways',
'and even moderatealcohol intake is associated with an increase in the number two cause of premature death: cancer.']
<强>解释强>
\s* # Match optional leading whitespace, don't capture that
( # Capture in group 1:
(?:\w+,)? # optionally: A single "word", followed by a comma
[^,]+ # and/or one or more characters except commas
) # End of group 1