Question

我正在尝试抓取'iframe'标签内的信息。当我执行此代码时，它表示未定义“USER_AGENT”。我该如何解决这个问题？

import requests
from bs4 import BeautifulSoup

page = requests.get("https://etherscan.io/token/0x168296bb09e24a88805cb9c33356536b980d3fc5#balances" + "/token/generic-tokenholders2?a=0x168296bb09e24a88805cb9c33356536b980d3fc5&amp;s=100000000000000000", headers=USER_AGENT, timeout=5)
soup = BeautifulSoup(page.content, "html.parser")
test = soup.find_all('iframe')

Answer 1

错误告诉你清楚是什么问题。您将作为标题USER_AGENT传入，您之前未在代码中定义过它。请查看this post有关如何在方法中使用标头的信息。

文档说明您必须传入请求的HTTP标头字典，而您传入了未定义的变量USER_AGENT。

来自Requests Library API：

标题 = 无

不区分大小写的响应标头词典。

例如，headers['content-encoding']将返回'Content-Encoding'响应标头的值。

修改

有关Content-Type标头的更好说明，请参阅this SO post.另请参阅this WebMasters post，其中介绍了Accept和Content-Type HTTP标头之间的区别。

由于您似乎只对抓取iframe标记感兴趣，因此您可以完全省略header参数，如果在代码中打印出test对象，则应该看到结果。< / p>
import requests from bs4 import BeautifulSoup page = requests.get("https://etherscan.io/token/0x168296bb09e24a88805cb9c33356536b980d3fc5#balances" + "/token/generic-tokenholders2?a=0x168296bb09e24a88805cb9c33356536b980d3fc5&s=100000000000000000", timeout=10) soup = BeautifulSoup(page.content, "lxml") test = soup.find_all('iframe') for tag in test: print(tag)

Answer 2

我们必须提供一个用户代理，HERE's指向虚假用户代理的链接。

import requests
from bs4 import BeautifulSoup


USER_AGENT = {
    'User-Agent': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/39.0.2171.95 Safari/53'}
url = "https://etherscan.io/token/0x168296bb09e24a88805cb9c33356536b980d3fc5#balances"
token = "/token/generic-tokenholders2?a=0x168296bb09e24a88805cb9c33356536b980d3fc5&amp;s=100000000000000000"


page = requests.get(url + token, headers=USER_AGENT, timeout=5)
soup = BeautifulSoup(page.content, "html.parser")
test = soup.find_all('iframe')

您可以不使用用户代理，代码：

import requests
from bs4 import BeautifulSoup


url = "https://etherscan.io/token/0x168296bb09e24a88805cb9c33356536b980d3fc5#balances"
token = "/token/generic-tokenholders2?a=0x168296bb09e24a88805cb9c33356536b980d3fc5&amp;s=100000000000000000"


page = requests.get(url + token, timeout=5)
soup = BeautifulSoup(page.content, "html.parser")
test = soup.find_all('iframe')

为了便于阅读，我已将您的网址分隔为网址和令牌。这就是为什么有两个变量URL和令牌

的原因

获取未定义USER_AGENT的错误（Python 3）

2 个答案: