Question

具有以下功能，可以检查某个网页是否存在keywoard

def checkString():   
    url_a = 'https://launchstudio.bluetooth.com/ListingDetails/50756'
    r_a = requests.get(url_a)
    soup_a = BeautifulSoup(r_a.text)

    for blem in soup_a(text=re.compile(r'RFCOMM')):
        return True

    return False

已经验证我的soup_a与url的view-source相同，但似乎我的搜索只返回包含在head标签内的结果，并且很难弄清楚原因。有什么建议？

Python版本2.7.5

Answer 1

您需要将lxml传递给BeautifulSoup课程。此外，如果找到匹配项，return True将跳出for循环。因此，如果确实在head标签中找到RFCOMM，则循环将退出，并且不会再注册匹配。最好使用列表推导并确定是否找到任何匹配项：

from bs4 import BeautifulSoup as soup
import urllib.request as urllib
import re
def checkString():   
   url_a = 'https://launchstudio.bluetooth.com/ListingDetails/50756'
   s = soup(str(urllib.urlopen(url_a).read()), 'lxml')
   return bool([i for i in s(text=re.compile(r'RFCOMM'))])

print(checkString())

输出：

True

Python和BeautifulSoup在html中查找文本字符串

1 个答案: