Question

我刚刚开始使用BeautifulSoup而我遇到了一个问题。我在下面设置了一个html片段并制作了一个BeautifulSoup对象：

html_snippet = '<p class="course"><span class="text84">Ae 100. Research in Aerospace. </span><span class="text85">Units to be arranged in accordance with work accomplished. </span><span class="text83">Open to suitably qualified undergraduates and first-year graduate students under the direction of the staff. Credit is based on the satisfactory completion of a substantive research report, which must be approved by the Ae 100 adviser and by the option representative. </span> </p>'
subject = BeautifulSoup(html_snippet)

我尝试过几次find和find_all操作，如下所示，但我得到的只是一个空的列表：

subject.find(text = 'A') 
subject.find(text = 'Research')
subject.next_element.find('A')
subject.find_all(text = 'A')

当我之前从我的计算机上的html文件创建BeautifulSoup对象时，find和find_all操作都运行良好。但是，当我通过urllib2将html_snippet从在线阅读网页时，我遇到了问题。

有人可以指出问题出在哪里吗？

Answer 1

传递这样的论点：

import re
subject.find(text=re.compile('A'))

text过滤器的默认行为是匹配整个主体。传入正则表达式可以匹配片段。

编辑：要仅匹配以A开头的实体，您可以使用以下内容：

subject.find(text=re.compile('^A'))

要仅匹配包含以A开头的单词的正文，您可以使用：

subject.find_all(text = re.compile(r'\bA'))

很难更具体地说出你在寻找什么，如果我误解了你的问题，请告诉我。

BeautifulSoup find和find_all没有按预期工作

1 个答案: