Question

我正在寻找具有特定模式的杂音文本：

text = "this is some text lskdfmd&@kjansdl and some more text sldkf&@lsakjd and some other stuff"

我希望能够删除此句子中空格之后，空格之前包含＆@的所有内容。

result = "this is some text and some more text and some other stuff"

正在尝试：

re.compile(r'([\s]&@.*?([\s])).sub(" ", text)

虽然我似乎无法获得第一部分。

Answer 1

尝试一下：

import re
result = re.findall(r"[a-zA-z]+\&\@[a-zA-z]+", text) 
print(result)
['lskdfmd&@kjansdl', 'sldkf&@lsakjd']

现在从所有单词的列表中删除result列表。

编辑1 @Jan的建议

re.sub(r"[a-zA-z]+\&\@[a-zA-z]+", '', text)
output: 'this is some text  and some more text  and some other stuff'

Edit2 由@Pushpesh Kumar Rajwanshi建议

re.sub(r" [a-zA-z]+\&\@[a-zA-z]+ ", " ", text)
output:'this is some text and some more text and some other stuff'

Answer 2

您可以使用此正则表达式捕获该噪音字符串，

\s+\S*&@\S*\s+

并将其替换为一个空格。

在此，\s+匹配任何空白，然后\S*匹配零个或多个非空白字符，同时将&@夹在其中，然后\S*匹配零个或多个空格，最后是\s+，然后是一个或多个空格，空格将其删除，从而为您提供预期的字符串。

此外，如果该噪声字符串可以位于字符串的开头或结尾，请随时将\s+更改为\s*

Regex Demo

Python代码，

import re

s = 'this is some text lskdfmd&@kjansdl and some more text sldkf&@lsakjd and some other stuff'
print(re.sub(r'\s+\S*&@\S*\s+', ' ', s))

打印

this is some text and some more text and some other stuff

Answer 3

您可以使用

\S+&@\S+\s*

请参见a demo on regex101.com。

在Python中：

import re
text = "this is some text lskdfmd&@kjansdl and some more text sldkf&@lsakjd and some other stuff"
rx = re.compile(r'\S+&@\S+\s*')
text = rx.sub('', text)
print(text)

哪个产量

this is some text and some more text and some other stuff

在特定模式之前和之后获取完整的字符串

3 个答案: