如何使用re.compile和re.findall删除括号内的文本?

时间:2019-06-14 13:42:14

标签: python regex list beautifulsoup

我想删除方括号(包括方括号)之间的文本。此文本存储在列表中。我还想存储输出文本(在新列表中不带括号)。

我尝试使用:

es = ["49,331,076","23,136,275","139,500 (est.)","124,000","522 (ranked 23 of 137)"]
length=len(es)
regex = re.compile(".*?\((.*?)\)")
for x in range(length):
    listy.append(re.findall(regex, es[p]))
    p=p+1

但是,这将返回括号之间的文本。

预期结果:

"[49,331,076, 23,136,275, 139,500, 124,000, 522]"

我得到的结果:

"[], [], [est.], [u'ranked 18 of 137']"

2 个答案:

答案 0 :(得分:1)

您可以将re.sub\([^()]*\)模式一起使用:

import re
es = ["49,331,076","23,136,275","139,500 (est.)","124,000","522 (ranked 23 of 137)"]
regex = re.compile(r"\([^()]*\)")
listy = []
for x in es:
    listy.append(regex.sub('', x).strip())
# Or, instead of the two lines above use a list comprehension:
# listy = [regex.sub('', x).strip() for x in es]
print(listy) # => ['49,331,076', '23,136,275', '139,500', '124,000', '522']

请参见Python demo

请注意,使用for x in es:遍历列表项比较容易,无需获取其长度,然后使用计数器跟踪当前项。使用列表推导[regex.sub('', x).strip() for x in es]更为Pythonic。

\([^()]*\)模式匹配(,然后匹配()以外的任何0+字符,然后匹配)。如果两者之间可以有(,请使用\(.*?\)\([^)]*\)

答案 1 :(得分:0)

我只想对匹配项进行sub()

import re
es = ["49,331,076","23,136,275","139,500 (est.)","124,000","522 (ranked 23 of 137)"]

length=len(es)
regex = re.compile("\(.+\)")
cleaned_es = [regex.sub('', val) for val in es]
print(cleaned_es)

您还可以抛出strip()只是为了删除所有结尾的空格:

cleaned_es = [regex.sub('', val).strip() for val in es]

哪个会给你:

['49,331,076', '23,136,275', '139,500', '124,000', '522']