我这里有这个代码,以Excel 2004 xml格式下载这个基金数据:
import urllib2
url = 'https://www.ishares.com/us/258100/fund-download.dl'
s = urllib2.urlopen(url)
contents = s.read()
file = open("export.xml", 'w')
file.write(contents)
file.close()
我的目标是以编程方式将此文件转换为.xls,然后我可以将其读入pandas DataFrame。我知道我可以使用python的xml库解析这个文件但是,我注意到如果我打开xml文件并用xls文件扩展名手动保存它,它可以被pandas读取并得到我想要的结果。
我还尝试使用以下代码重命名文件扩展名,但是这种方法不会强制使用#34;保存文件,它仍然作为基础xml文档与xls文件ext ..
import os
import sys
folder = '~/models'
for filename in os.listdir(folder):
if filename.startswith('export'):
infilename = filename
newname = infilename.replace('newfile.xls', 'f.xls')
output = os.rename(infilename, newname)
答案 0 :(得分:0)
使用Excel for Windows,请考虑使用Python将COM连接到使用win32com
模块的Excel对象库。具体来说,使用Excel的Workbooks.OpenXML和SaveAs方法将下载的xml保存为csv:
import os
import win32com.client as win32
import requests as r
import pandas as pd
cd = os.path.dirname(os.path.abspath(__file__))
url = "http://www.ishares.com/us/258100/fund-download.dl"
xmlfile = os.path.join(cd, 'iSharesDownload.xml')
csvfile = os.path.join(cd, 'iSharesDownload.csv')
# DOWNLOAD FILE
try:
rqpage = r.get(url)
with open(xmlfile, 'wb') as f:
f.write(rqpage.content)
except Exception as e:
print(e)
finally:
rqpage = None
# EXCEL COM TO SAVE EXCEL XML AS CSV
if os.path.exists(csvfile):
os.remove(csvfile)
try:
excel = win32.gencache.EnsureDispatch('Excel.Application')
wb = excel.Workbooks.OpenXML(xmlfile)
wb.SaveAs(csvfile, 6)
wb.Close(True)
except Exception as e:
print(e)
finally:
# RELEASES RESOURCES
wb = None
excel = None
# IMPORT CSV INTO PANDAS DATAFRAME
df = pd.read_csv(csvfile, skiprows=8)
print(df.describe())
# Weight (%) Price Coupon (%) YTM (%) Yield to Worst (%) Duration
# count 625.000000 625.000000 625.000000 625.000000 625.000000 625.000000
# mean 0.159888 101.298768 6.500256 5.881168 5.313760 2.128688
# std 0.126833 10.469460 1.932744 4.059226 4.224268 1.283360
# min -0.110000 0.000000 0.000000 0.000000 -8.030000 0.000000
# 25% 0.090000 100.380000 5.130000 3.430000 3.070000 0.970000
# 50% 0.130000 102.940000 6.380000 4.930000 3.910000 2.240000
# 75% 0.190000 105.000000 7.630000 6.820000 6.070000 3.260000
# max 1.750000 128.750000 12.500000 40.900000 40.900000 5.060000
答案 1 :(得分:0)
使用Excel for MAC,考虑使用VBA解决方案,因为VBA是与Excel对象库交互的最常用语言。下面下载 iShares xml,然后使用OpenXML
和SaveAs
方法将其保存为csv以进行pandas导入。
注意:这在Mac上未经测试,但希望 Microsoft.XMLHTTP 对象可用。
VBA (保存在启用宏的工作簿中)
Option Explicit
Sub DownloadXML()
On Error GoTo ErrHandle
Dim wb As Workbook
Dim xmlDoc As Object
Dim xmlfile As String, csvfile As String
xmlfile = ActiveWorkbook.Path & "\file.xml"
csvfile = ActiveWorkbook.Path & "\file.csv"
Call DownloadFile("https://www.ishares.com/us/258100/fund-download.dl", xmlfile)
Set wb = Excel.Workbooks.OpenXML(xmlfile)
wb.SaveAs csvfile, 6
wb.Close True
ExitHandle:
Set wb = Nothing
Set xmlDoc = Nothing
Exit Sub
ErrHandle:
MsgBox Err.Number & " - " & Err.Description, vbCritical
Resume ExitHandle
End Sub
Function DownloadFile(url As String, filePath As String)
Dim WinHttpReq As Object, oStream As Object
Set WinHttpReq = CreateObject("Microsoft.XMLHTTP")
WinHttpReq.Open "GET", url, False
WinHttpReq.send
If WinHttpReq.Status = 200 Then
Set oStream = CreateObject("ADODB.Stream")
oStream.Open
oStream.Type = 1
oStream.Write WinHttpReq.responseBody
oStream.SaveToFile filePath, 2 ' 1 = no overwrite, 2 = overwrite
oStream.Close
End If
Set WinHttpReq = Nothing
Set oStream = Nothing
End Function
<强>的Python 强>
import pandas as pd
csvfile = "/path/to/file.csv"
# IMPORT CSV INTO PANDAS DATAFRAME
df = pd.read_csv(csvfile, skiprows=8)
print(df.describe())
# Weight (%) Price Coupon (%) YTM (%) Yield to Worst (%) Duration
# count 625.000000 625.000000 625.000000 625.000000 625.000000 625.000000
# mean 0.159888 101.298768 6.500256 5.881168 5.313760 2.128688
# std 0.126833 10.469460 1.932744 4.059226 4.224268 1.283360
# min -0.110000 0.000000 0.000000 0.000000 -8.030000 0.000000
# 25% 0.090000 100.380000 5.130000 3.430000 3.070000 0.970000
# 50% 0.130000 102.940000 6.380000 4.930000 3.910000 2.240000
# 75% 0.190000 105.000000 7.630000 6.820000 6.070000 3.260000
# max 1.750000 128.750000 12.500000 40.900000 40.900000 5.060000
答案 2 :(得分:0)
我能够通过发现我正在使用的网站开发了api来绕过网络抓取。然后使用python的requests
模块。
url = "https://www.blackrock.com/tools/hackathon/performance
for ticker in tickers:
params = {'identifiers': ticker ,
'returnsType':'MONTHLY'}
request = requests.get(url, params=params)
json = request.json()