bs4.FeatureNotFound:找不到具有您请求的功能的树生成器:html-parser。您需要安装解析器库吗?

时间:2020-03-29 11:13:47

标签: python python-3.x beautifulsoup html-parser html-treebuilder

我正尝试通过以下代码进行网络抓取

        from bs4 import BeautifulSoup
        import requests
        import pandas as pd

        page = requests.get('https://www.google.com/search?q=phagwara+weather')
        soup = BeautifulSoup(page.content, 'html-parser')
        day = soup.find(id='wob_wc')

        print(day.find_all('span'))

但是不断出现以下错误:

 File "C:\Users\myname\Desktop\webscraping.py", line 6, in <module>
    soup = BeautifulSoup(page.content, 'html-parser')
  File "C:\Users\myname\AppData\Local\Programs\Python\Python38-32\lib\site-packages\bs4\__init__.py", line 225, in __init__
    raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html-parser. Do you need to install a parser library?

我安装了 lxml和html5lib ,但此问题仍然存在。

2 个答案:

答案 0 :(得分:2)

您需要将'html-parser'更改为soup = BeautifulSoup(page.content, 'html.parser')

答案 1 :(得分:0)

您需要提及标记,因此它应该是soup.find(id="wob_wc")而不是soup.find("div", id="wob_wc"))

解析器名称是html.parser而不是html-parser,区别是点。

默认情况下,Google通常也会给您一个200的响应,以防止您了解是否阻止。通常您需要检查r.content

我加入了headers,现在可以使用了。

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0'}
r = requests.get(
    "https://www.google.com/search?q=phagwara+weather", headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')

print(soup.find("div", id="wob_wc"))