我正在努力去除字符串的某个部分,而且我对正则表达式相当新。我想删除在引号中有人名的字符串部分,"已断开",以及"由客户"结束。线。我正在处理的一个句子的例子是:
new_text = "this is the 'ideal' problem 'joe smith' disconnected ('Concluded by customer')."
re.sub(r"\s'([\w\W\d]+)' disconnected \(.*\)[.|\s]*", '', new_text)
结果显示:
"this is the"
但是我想要得到:
"this is the 'ideal' problem"
关于如何更改正则表达式模式的任何想法?
答案 0 :(得分:2)
这是一种可能性:
import re
new_text = "this is the 'ideal' problem 'joe smith' disconnected ('Concluded by customer')."
result = re.sub(r"(^.*)\s+'[^']+' disconnected.*$", r"\1", new_text)
print(result)
输出:
this is the 'ideal' problem
答案 1 :(得分:0)
您可以使用Positive Lookahead (?= disconnected)
import re
pattern=r'\w.+(?=\sdisconnected)'
text="this is the 'ideal' problem 'joe smith' disconnected ('Concluded by customer')."
data=re.findall(pattern,text)[0].split("'")[:-2]
print("'".join(data))
输出:
this is the 'ideal' problem