Question

我正在努力去除字符串的某个部分，而且我对正则表达式相当新。我想删除在引号中有人名的字符串部分，＆＃34;已断开＆＃34;，以及＆＃34;由客户＆＃34;结束。线。我正在处理的一个句子的例子是：

new_text = "this is the 'ideal' problem 'joe smith' disconnected ('Concluded by customer')."
re.sub(r"\s'([\w\W\d]+)' disconnected \(.*\)[.|\s]*", '', new_text)

结果显示：

"this is the"

但是我想要得到：

"this is the 'ideal' problem"

关于如何更改正则表达式模式的任何想法？

Answer 1

这是一种可能性：

import re

new_text = "this is the 'ideal' problem 'joe smith' disconnected ('Concluded by customer')."
result = re.sub(r"(^.*)\s+'[^']+' disconnected.*$", r"\1", new_text)
print(result)

输出：

this is the 'ideal' problem

Answer 2

您可以使用Positive Lookahead (?= disconnected)

import re

pattern=r'\w.+(?=\sdisconnected)'
text="this is the 'ideal' problem 'joe smith' disconnected ('Concluded by customer')."

data=re.findall(pattern,text)[0].split("'")[:-2]
print("'".join(data))

输出：

this is the 'ideal' problem

仅删除引文中句子的某个部分

2 个答案: