如何将一个参数传递给beautifulsoup中的re.compile()字符串?

时间:2016-02-19 14:03:59

标签: python regex python-2.7 beautifulsoup

我正在学习beautifulsoup,我想使用正则表达式来过滤字符串。

例如,html标记为:

<div>apple<\div>
<div>android<\div>
<div>windows<\div>

此代码将起作用:

re_words = re.compile(u".*(apple|android).*")

for content in body.findAll("div"):
    if re_words.match(content.text):
        print content.text

但是我想在正则表达式中动态添加关键字,所以我尝试编写这段代码:

word0 = "apple"
word1 = "android"

regular = "u""\".*("

regular += word0
regular += "|"
regular += word1

regular +=").*\""

re_words = re.compile(regular)

for content in body.findAll("div"):
    if re_words.match(content.text):
        print content.text

这次我没能创建合法的re.compile()。有人会帮忙吗?

1 个答案:

答案 0 :(得分:0)

首先,您可以将compiled regular expression传递给find_all()来电的|参数。要动态创建正则表达式,我会将占位符放入括号中并使用keywords = ["apple", "android"] pattern = r"(%s)" % "|".join(keywords) for content in body.find_all("div", text=re.compile(pattern)): print(content.text) 加入关键字:

text

或者,您可以将callable作为keywords = ["apple", "android"] for content in body.find_all("div", text=lambda text: any(keyword in text for keyword in keywords)): print(content.text) 参数值传递:

keywords = ["apple", "android"]
for content in body.find_all("div", text=keywords):
    print(content.text)

另请注意,如果您需要精确匹配文本,则不需要正则表达式:

for thing in my_list: #don't call it "list"
    if isinstance(thing, list):
        for other in thing:
            print(other)