Python和BeautifulSoup在html中查找文本字符串

时间:2017-12-28 20:52:17

标签: python beautifulsoup

具有以下功能,可以检查某个网页是否存在keywoard

def checkString():   
    url_a = 'https://launchstudio.bluetooth.com/ListingDetails/50756'
    r_a = requests.get(url_a)
    soup_a = BeautifulSoup(r_a.text)

    for blem in soup_a(text=re.compile(r'RFCOMM')):
        return True

    return False 

已经验证我的soup_a与url的view-source相同,但似乎我的搜索只返回包含在head标签内的结果,并且很难弄清楚原因。有什么建议?

Python版本2.7.5

1 个答案:

答案 0 :(得分:2)

您需要将lxml传递给BeautifulSoup课程。此外,如果找到匹配项,return True将跳出for循环。因此,如果确实在head标签中找到RFCOMM,则循环将退出,并且不会再注册匹配。最好使用列表推导并确定是否找到任何匹配项:

from bs4 import BeautifulSoup as soup
import urllib.request as urllib
import re
def checkString():   
   url_a = 'https://launchstudio.bluetooth.com/ListingDetails/50756'
   s = soup(str(urllib.urlopen(url_a).read()), 'lxml')
   return bool([i for i in s(text=re.compile(r'RFCOMM'))])

print(checkString())

输出:

True