Question

我有这个HTML：

title="Keeper: Michal Buchalik" class="pos_text">Buchalik</a></span>                
                                            <span class="pos_text pos3_l_5">

我尝试匹配Buchalik。

我想出了这段代码：

for gk in soup.find_all(re.compile("pos_text pos3_l_\d{1,2}")):
    print gk.previous_element.previous_element,

它与任何东西都不匹配，并且正则表达式一定存在问题，因为当我输入某个数字代替\d{1,2}时，它的工作正常。

Answer 1

因为它是python，你需要使用r作为“原始文本”或转义'\'字符：

re.compile(r"pos_text pos3_l_\d{1,2}")

OR

re.compile("pos_text pos3_l_\\d{1,2}")

看看它是否有帮助。

干杯。

BeautifulSoup传递正则表达式作为参数

1 个答案: