Question

我正尝试通过以下代码进行网络抓取：

        from bs4 import BeautifulSoup
        import requests
        import pandas as pd

        page = requests.get('https://www.google.com/search?q=phagwara+weather')
        soup = BeautifulSoup(page.content, 'html-parser')
        day = soup.find(id='wob_wc')

        print(day.find_all('span'))

但是不断出现以下错误：

 File "C:\Users\myname\Desktop\webscraping.py", line 6, in <module>
    soup = BeautifulSoup(page.content, 'html-parser')
  File "C:\Users\myname\AppData\Local\Programs\Python\Python38-32\lib\site-packages\bs4\__init__.py", line 225, in __init__
    raise FeatureNotFound(
bs4.FeatureNotFound: Couldn't find a tree builder with the features you requested: html-parser. Do you need to install a parser library?

我安装了 lxml和html5lib ，但此问题仍然存在。

Answer 1

您需要将'html-parser'更改为soup = BeautifulSoup(page.content, 'html.parser')

Answer 2

您需要提及标记，因此它应该是soup.find(id="wob_wc")而不是soup.find("div", id="wob_wc"))

解析器名称是html.parser而不是html-parser，区别是点。

默认情况下，Google通常也会给您一个200的响应，以防止您了解是否阻止。通常您需要检查r.content。

我加入了headers，现在可以使用了。

import requests
from bs4 import BeautifulSoup

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:74.0) Gecko/20100101 Firefox/74.0'}
r = requests.get(
    "https://www.google.com/search?q=phagwara+weather", headers=headers)
soup = BeautifulSoup(r.content, 'html.parser')

print(soup.find("div", id="wob_wc"))

bs4.FeatureNotFound：找不到具有您请求的功能的树生成器：html-parser。您需要安装解析器库吗？

2 个答案: