在查看读取url链接的不同方法后,指向.xls文件,我决定使用xlrd。
我很难将'xlrd.book.Book'类型转换为'pandas.DataFrame'
我有以下内容:
import pandas
import xlrd
import urllib2
link ='http://www.econ.yale.edu/~shiller/data/chapt26.xls'
socket = urllib2.urlopen(link)
#this line gets me the excel workbook
xlfile = xlrd.open_workbook(file_contents = socket.read())
#storing the sheets
sheets = xlfile.sheets()
我想删除最后一张sheets
并导入为pandas.DataFrame
,有关如何实现此目的的任何想法?我试过了,pandas.ExcelFile.parse()
但它想要一个excel文件的路径。我当然可以将文件保存到内存然后解析(使用tempfile
或其他东西),但我正在尝试遵循pythonic指南并使用已经写入pandas的可能功能。
任何指导都会一如既往地受到高度赞赏。
答案 0 :(得分:24)
您可以将socket
传递给ExcelFile
:
>>> import pandas as pd
>>> import urllib2
>>> link = 'http://www.econ.yale.edu/~shiller/data/chapt26.xls'
>>> socket = urllib2.urlopen(link)
>>> xd = pd.ExcelFile(socket)
NOTE *** Ignoring non-worksheet data named u'PDVPlot' (type 0x02 = Chart)
NOTE *** Ignoring non-worksheet data named u'ConsumptionPlot' (type 0x02 = Chart)
>>> xd.sheet_names
[u'Data', u'Consumption', u'Calculations']
>>> df = xd.parse(xd.sheet_names[-1], header=None)
>>> df
0 1 2 3 4
0 Average Real Interest Rate: NaN NaN NaN 1.028826
1 Geometric Average Stock Return: NaN NaN NaN 0.065533
2 exp(geo. Avg. return) NaN NaN NaN 0.067728
3 Geometric Average Dividend Growth NaN NaN NaN 0.012025
答案 1 :(得分:0)
您可以将网址传递给pandas.read_excel()
:
import pandas as pd
link ='http://www.econ.yale.edu/~shiller/data/chapt26.xls'
data = pd.read_excel(link,'sheetname')