我正在寻找具有特定模式的杂音文本:
text = "this is some text lskdfmd&@kjansdl and some more text sldkf&@lsakjd and some other stuff"
我希望能够删除此句子中空格之后,空格之前包含&@的所有内容。
result = "this is some text and some more text and some other stuff"
正在尝试:
re.compile(r'([\s]&@.*?([\s])).sub(" ", text)
虽然我似乎无法获得第一部分。
答案 0 :(得分:3)
尝试一下:
import re
result = re.findall(r"[a-zA-z]+\&\@[a-zA-z]+", text)
print(result)
['lskdfmd&@kjansdl', 'sldkf&@lsakjd']
现在从所有单词的列表中删除result
列表。
编辑1 @Jan的建议
re.sub(r"[a-zA-z]+\&\@[a-zA-z]+", '', text)
output: 'this is some text and some more text and some other stuff'
Edit2 由@Pushpesh Kumar Rajwanshi建议
re.sub(r" [a-zA-z]+\&\@[a-zA-z]+ ", " ", text)
output:'this is some text and some more text and some other stuff'
答案 1 :(得分:2)
您可以使用此正则表达式捕获该噪音字符串,
\s+\S*&@\S*\s+
并将其替换为一个空格。
在此,\s+
匹配任何空白,然后\S*
匹配零个或多个非空白字符,同时将&@
夹在其中,然后\S*
匹配零个或多个空格,最后是\s+
,然后是一个或多个空格,空格将其删除,从而为您提供预期的字符串。
此外,如果该噪声字符串可以位于字符串的开头或结尾,请随时将\s+
更改为\s*
Python代码,
import re
s = 'this is some text lskdfmd&@kjansdl and some more text sldkf&@lsakjd and some other stuff'
print(re.sub(r'\s+\S*&@\S*\s+', ' ', s))
打印
this is some text and some more text and some other stuff
答案 2 :(得分:2)
您可以使用
\S+&@\S+\s*
Python
中:
import re
text = "this is some text lskdfmd&@kjansdl and some more text sldkf&@lsakjd and some other stuff"
rx = re.compile(r'\S+&@\S+\s*')
text = rx.sub('', text)
print(text)
哪个产量
this is some text and some more text and some other stuff