Question

我挣扎了几个小时如何用pd.read_excel读取excel文件，其中路径是网站地址。我发现该链接并不直接转到该文件，只是触发下载。有没有简单的方法来解决它？

部分代码：

link_energy = 'http://unstats.un.org/unsd/environment/excel_file_tables/2013/Energy%20Indicators.xls'
df_energy = pd.read_excel(link_energy)

错误讯息：

XLRDError: Unsupported format, or corrupt file: Expected BOF record; found b'\n\n\n<!DOC'

可能它不是大熊猫的问题，但我缺乏技能怎么做。

Answer 1

对我来说，按照以下代码中的预期工作：

def self.return_this_data_for_map_method
data = { :labels => [], datasets: [data: []] }
dictionary = {}
results.each do |teams|
    team = teams[0]
    teamMembers = teams[1]

    if dictionary[team].nil?
        dictionary[team] = teamMembers
    else 
        dictionary[team] += teamMembers

    end
end
data[:labels] += dictionary.keys
data[:datasets][0][:data] += dictionary.values
data

以下环境没有错误：

笔记本服务器的版本是：5.2.2 服务器在此版本的Python上运行：

Python 3.6.3 |由conda-forge打包| （默认，2017年11月4日，10：10：56） [GCC 4.8.2 20140120（Red Hat 4.8.2-15）]

当前内核信息：

Python 3.6.3 |由conda-forge打包| （默认，2017年11月4日，10：10：56）输入＆＃39; copyright＆＃39;，＆＃39; credit＆＃39;或者＆＃39;许可证＆＃39;欲获得更多信息 IPython 6.2.1 - 增强的交互式Python。输入＆＃39;？＆＃39;寻求帮助。

Answer 2

但是我无权访问您的网址。

但pd.read_excel无法使用，您需要使用pd.read_csv

import pandas as pd

df = pd.read_csv('https://cib.societegenerale.com/fileadmin/indices_feeds/CTA_Historical.xls')

现在你需要查看excel文件包含的内容是什么是使用的分隔符，如果任何列中有任何其他值，则需要跳过它才能加载和读取有用的数据。

熊猫read_excel

2 个答案: