Question

我需要提示如何从网站获取数据。我对Web监视很陌生。特殊的是，我无法访问该网站，因为它在另一个网络上本地运行。为了开发，我只将网站作为html文件。知道我的问题是我的以下代码出现错误。我认为问题很简单，但到目前为止我还没有想法。

import requests
import urllib.request
import time
from bs4 import BeautifulSoup

url = 'file:///tmp/mozilla/LiveData.html' # file is locally so far
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")

我收到以下错误：

NewConnectionError: <urllib3.connection.HTTPConnection object at 0x7fc151db7550>: Failed to establish a new connection: [Errno -2] Name or service not known

当它是本地网站而不是“真实”网站时，它可能无法正常工作。感谢您的帮助！

Answer 1

您不能在本地文件上使用.get方法。首先阅读文件，然后将其传递给bs4.
您可以通过类似的方法实现类似的目的。示例：

import requests
import urllib.request
import time
from bs4 import BeautifulSoup

# url = 'file:///tmp/mozilla/LiveData.html' # file is locally so far
with open('/tmp/mozilla/LiveData.html', 'r') as f:
    response = f.read()
soup = BeautifulSoup(response.text, "html.parser")

Answer 2

requests.get在将获取请求发送到URL后从网站返回响应。由于您的网站只是本地文件，而不是实时运行（并监听请求），因此它不会从get请求中返回任何内容。

requests.get（URL，params = None，** kwargs）[源代码]发送GET请求。

参数：url –新Request对象的URL。参数–（可选）   字典，要在查询字符串中发送的元组或字节列表   请求。   ** kwargs –请求采用的可选参数。返回：响应对象

返回类型：requests.Response

如何使用python从本地网站获取数据

2 个答案: