Question

我正在学习beautifulsoup，我想使用正则表达式来过滤字符串。

例如，html标记为：

<div>apple<\div>
<div>android<\div>
<div>windows<\div>

此代码将起作用：

re_words = re.compile(u".*(apple|android).*")

for content in body.findAll("div"):
    if re_words.match(content.text):
        print content.text

但是我想在正则表达式中动态添加关键字，所以我尝试编写这段代码：

word0 = "apple"
word1 = "android"

regular = "u""\".*("

regular += word0
regular += "|"
regular += word1

regular +=").*\""

re_words = re.compile(regular)

for content in body.findAll("div"):
    if re_words.match(content.text):
        print content.text

这次我没能创建合法的re.compile（）。有人会帮忙吗？

Answer 1

首先，您可以将compiled regular expression传递给find_all()来电的|参数。要动态创建正则表达式，我会将占位符放入括号中并使用keywords = ["apple", "android"] pattern = r"(%s)" % "|".join(keywords) for content in body.find_all("div", text=re.compile(pattern)): print(content.text)加入关键字：

text

或者，您可以将callable作为keywords = ["apple", "android"] for content in body.find_all("div", text=lambda text: any(keyword in text for keyword in keywords)): print(content.text)参数值传递：

keywords = ["apple", "android"]
for content in body.find_all("div", text=keywords):
    print(content.text)

另请注意，如果您需要精确匹配文本，则不需要正则表达式：

for thing in my_list: #don't call it "list"
    if isinstance(thing, list):
        for other in thing:
            print(other)

如何将一个参数传递给beautifulsoup中的re.compile（）字符串？

1 个答案: