仅删除引文中句子的某个部分

时间:2018-05-29 14:53:24

标签: python regex

我正在努力去除字符串的某个部分,而且我对正则表达式相当新。我想删除在引号中有人名的字符串部分,"已断开",以及"由客户"结束。线。我正在处理的一个句子的例子是:

new_text = "this is the 'ideal' problem 'joe smith' disconnected ('Concluded by customer')."
re.sub(r"\s'([\w\W\d]+)' disconnected \(.*\)[.|\s]*", '', new_text)

结果显示:

"this is the"

但是我想要得到:

"this is the 'ideal' problem"

关于如何更改正则表达式模式的任何想法?

2 个答案:

答案 0 :(得分:2)

这是一种可能性:

import re

new_text = "this is the 'ideal' problem 'joe smith' disconnected ('Concluded by customer')."
result = re.sub(r"(^.*)\s+'[^']+' disconnected.*$", r"\1", new_text)
print(result)

输出:

this is the 'ideal' problem

答案 1 :(得分:0)

您可以使用Positive Lookahead (?= disconnected)

import re

pattern=r'\w.+(?=\sdisconnected)'
text="this is the 'ideal' problem 'joe smith' disconnected ('Concluded by customer')."

data=re.findall(pattern,text)[0].split("'")[:-2]
print("'".join(data))

输出:

this is the 'ideal' problem