在特定模式之前和之后获取完整的字符串

时间:2019-04-23 17:22:35

标签: python regex python-3.x pandas

我正在寻找具有特定模式的杂音文本:

text = "this is some text lskdfmd&@kjansdl and some more text sldkf&@lsakjd and some other stuff"

我希望能够删除此句子中空格之后,空格之前包含&@的所有内容。

result = "this is some text and some more text and some other stuff"

正在尝试:

re.compile(r'([\s]&@.*?([\s])).sub(" ", text)

虽然我似乎无法获得第一部分。

3 个答案:

答案 0 :(得分:3)

尝试一下:

import re
result = re.findall(r"[a-zA-z]+\&\@[a-zA-z]+", text) 
print(result)
['lskdfmd&@kjansdl', 'sldkf&@lsakjd']

现在从所有单词的列表中删除result列表。

编辑1 @Jan的建议

re.sub(r"[a-zA-z]+\&\@[a-zA-z]+", '', text)
output: 'this is some text  and some more text  and some other stuff'

Edit2 由@Pushpesh Kumar Rajwanshi建议

re.sub(r" [a-zA-z]+\&\@[a-zA-z]+ ", " ", text)
output:'this is some text and some more text and some other stuff'

答案 1 :(得分:2)

您可以使用此正则表达式捕获该噪音字符串,

\s+\S*&@\S*\s+

并将其替换为一个空格。

在此,\s+匹配任何空白,然后\S*匹配零个或多个非空白字符,同时将&@夹在其中,然后\S*匹配零个或多个空格,最后是\s+,然后是一个或多个空格,空格将其删除,从而为您提供预期的字符串。

此外,如果该噪声字符串可以位于字符串的开头或结尾,请随时将\s+更改为\s*

Regex Demo

Python代码,

import re

s = 'this is some text lskdfmd&@kjansdl and some more text sldkf&@lsakjd and some other stuff'
print(re.sub(r'\s+\S*&@\S*\s+', ' ', s))

打印

this is some text and some more text and some other stuff

答案 2 :(得分:2)

您可以使用

\S+&@\S+\s*

请参见a demo on regex101.com


Python中:

import re
text = "this is some text lskdfmd&@kjansdl and some more text sldkf&@lsakjd and some other stuff"
rx = re.compile(r'\S+&@\S+\s*')
text = rx.sub('', text)
print(text)

哪个产量

this is some text and some more text and some other stuff