Question

我想使用Python 3自动访问文件。网站为https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls

当您在资源管理器中手动输入url时，它会要求您下载文件，但我想在python中自动执行此操作并将数据作为df加载。

我收到以下错误

URLError：

{{1}}

Answer 1

$ curl https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls

<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<html><head>
<title>307 Temporary Redirect</title>
</head><body>
<h1>Temporary Redirect</h1>
<p>The document has moved <a href="https://www.dax-indices.com/document/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls">here</a>.</p>
</body></html>

您刚刚被重定向。有多种方法可以在代码中实现，但我只需要将url更改为“ https://www.dax-indices.com/document/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls”

Answer 2

我在jupyter环境中运行了您的代码，并且成功了。没有提示错误，但数据框只有NaN值。我检查了您尝试读取的xls文件，它似乎不包含任何数据...

还有其他检索xls数据的方法，例如：downloading an excel file from the web in python

import requests
url = 'https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls'

resp = requests.get(url)

output = open('my-sheet.xls', 'wb')
output.write(resp.content)
output.close()

df=pd.read_excel('my-sheet.xls')
print(df.head())

Answer 3

您可以直接使用熊猫和.read_excel方法

df = pd.read_excel("https://www.dax-indices.com/documents/dax-indices/Documents/Resources/WeightingFiles/Ranking/2019/March/MDAX_RKC.20190329.xls", sheet_name='Data', skiprows=5)

df.head(1)

Output

Answer 4

对不起，队友。它可以在我的PC上运行（不是很有帮助的注释）。这是您可以执行的操作的列表->

获取参考并检查参考的状态码（200或300表示一切正常，其他含义不同）
检查该链接是否阻止了漫游器访问（某些网站会这么做）
在阻止访问bot的情况下，请将硒用于python

从互联网访问数据

4 个答案: