我有这样的网址:
https://www.oslobors.no/ob/servlets/excel?type=history&columns=TIME%2C+BUYER%2C+SELLER%2C+PRICE%2C+VOLUME%2C+TYPE&format[TIME]=dd.mm.YY%20hh:MM:ss&format[PRICE]=%23%2C%23%230.00%23%23%23&format[VOLUME]=%23%2C%23%230&header[TIME]=Statoil&header[BUYER]=Kj%C3%B8per&header[SELLER]=Selger&header[PRICE]=Pris&header[VOLUME]=Volum&header[TYPE]=Type&view=DELAYED&source=feed.ose.trades.INSTRUMENTS&filter=ITEM_SECTOR%3D%3DsSTL.OSE%26%26DELETED!%3Dn1&stop=now&start=1493935200000&ascending=true
我可以在Excel中打开它(删除' l' in' tinyurll'):
Sub Get_File()
Dim oXMLHTTP As Object: Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
Dim strURL As String: strURL = "http://tinyurll.com/api-create.php?url=https://www.oslobors.no/ob/servlets/excel?type=history&columns=TIME%2C+BUYER%2C+SELLER%2C+PRICE%2C+VOLUME%2C+TYPE&format[TIME]=dd.mm.YY%20hh:MM:ss&format[PRICE]=%23%2C%23%230.00%23%23%23&format[VOLUME]=%23%2C%23%230&header[TIME]=Statoil&header[BUYER]=Kj%C3%B8per&header[SELLER]=Selger&header[PRICE]=Pris&header[VOLUME]=Volum&header[TYPE]=Type&view=DELAYED&source=feed.ose.trades.INSTRUMENTS&filter=ITEM_SECTOR%3D%3DsSTL.OSE%26%26DELETED!%3Dn1&stop=now&start=1493935200000&ascending=true"
With oXMLHTTP: .Open "GET", strURL, False: .send: End With
strURL = oXMLHTTP.responseText
With Workbooks: .Open strURL, IgnoreReadOnlyRecommended:=True: End With
End Sub
但我想将内容下载到文本文件中,而不是使用Python下载excel文件?
答案 0 :(得分:0)
即使该文件将作为excel文件(可能是.xlsx
)下载,我认为您仍然可以打开并将其作为带有Python的CSV文件读取(可能需要查看this question获得更多细节如何)。如果这个excel文件有多个工作表,那么最终可能会出现问题。如果是这种情况,您可能需要使用额外的库(like pandas)来管理打开并从excel文件中捕获数据。
打开并读取文件后,您可以只使用要保留的内容写入新的文本文件。 This other question有一些关于如何做到这一点的好信息。
如果文件中只有一个工作表,那么csv方法就可以了,看起来像这样:
(已修改,将rb
更改为rt
并打开CSV
import csv
my_read_path = '/directory/some_excel_file.xlsx'
text_file = open('/directory/my_output.txt', "w")
with open(my_read_path, 'rt') as csv_file:
csv_reader = csv.reader(csv_file)
for line in list(csv_reader):
text_file.write(line) # assumes you want to write any line
text_file.close()
用pandas
之类的东西读书可能会更复杂,但无论如何都可能是一次宝贵的学习经历。
答案 1 :(得分:0)
我设法下载到' .xlsx'文件使用:
import requests
import time
import csv
url = 'https://www.oslobors.no/ob/servlets/excel?type=history&columns=TIME%2C+BUYER%2C+SELLER%2C+PRICE%2C+VOLUME%2C+TYPE&format[TIME]=dd.mm.YY%20hh:MM:ss&format[PRICE]=%23%2C%23%230.00%23%23%23&format[VOLUME]=%23%2C%23%230&header[TIME]=Statoil&header[BUYER]=Kj%C3%B8per&header[SELLER]=Selger&header[PRICE]=Pris&header[VOLUME]=Volum&header[TYPE]=Type&view=DELAYED&source=feed.ose.trades.INSTRUMENTS&filter=ITEM_SECTOR%3D%3DsSTL.OSE%26%26DELETED!%3Dn1&stop=now&start=1493935200000&ascending=true'
file_name = 'C:\\Users\\AR\\Documents\\DownloadFile.xlsx'
while True:
try:
resp = requests.get(url)
with open(file_name, 'wb') as output:
output.write(resp.content)
break
except Exception as e:
print(str(e))
time.sleep(3)
使用扩展程序' .txt'在' file_name'中,给我一个以:
开头的文件PK L©J _rels/.rels’ÁjÃ0†ï}
£{ã´ƒ1FÝ^Æ ·2ºÐl%1I,c«[öö3»l
l°£ôýH»Ã4ê•RölªË·ÖÀóùq}*‡2ûÕ²’;³*Œ
t"ñ^ël;1W)”NÃiD)ejuDÛcKz[×·:}gÀªŽÎ@:º
¨3¦–ÄÀ4è7Nýs_ni¼GúM*7·ôÀö2R+á³
答案 2 :(得分:0)
使用' Openpyxl'找到解决方案(即可以从excel文件中读取而无需打开Excel(excel工作簿):
from openpyxl import load_workbook #https://openpyxl.readthedocs.io/en/latest/index.html
import requests
import time
url = 'https://www.oslobors.no/ob/servlets/excel?type=history&columns=TIME%2C+BUYER%2C+SELLER%2C+PRICE%2C+VOLUME%2C+TYPE&format[TIME]=dd.mm.YY%20hh:MM:ss&format[PRICE]=%23%2C%23%230.00%23%23%23&format[VOLUME]=%23%2C%23%230&header[TIME]=Statoil&header[BUYER]=Kj%C3%B8per&header[SELLER]=Selger&header[PRICE]=Pris&header[VOLUME]=Volum&header[TYPE]=Type&view=DELAYED&source=feed.ose.trades.INSTRUMENTS&filter=ITEM_SECTOR%3D%3DsSTL.OSE%26%26DELETED!%3Dn1&stop=now&start=1493935200000&ascending=true'
file_name = 'DownloadFile.xlsx'
while True:
try:
resp = requests.get(url)
with open(file_name, 'wb') as output:
output.write(resp.content)
break
except Exception as e:
print(str(e))
time.sleep(3)
wb = load_workbook(file_name)
ws = wb['data']
for row in ws.rows:
for cell in row:
print(cell.value)
答案 3 :(得分:0)
这是一个可行的解决方案,可以下载文件,保存到excel文件,从excel文件读取并保存为可读文本到文本文件。
from openpyxl import load_workbook #https://openpyxl.readthedocs.io/en/latest/index.html
import requests
import time
url = 'https://www.oslobors.no/ob/servlets/excel?type=history&columns=TIME%2C+BUYER%2C+SELLER%2C+PRICE%2C+VOLUME%2C+TYPE&format[TIME]=dd.mm.YY%20hh:MM:ss&format[PRICE]=%23%2C%23%230.00%23%23%23&format[VOLUME]=%23%2C%23%230&header[TIME]=Statoil&header[BUYER]=Kj%C3%B8per&header[SELLER]=Selger&header[PRICE]=Pris&header[VOLUME]=Volum&header[TYPE]=Type&view=DELAYED&source=feed.ose.trades.INSTRUMENTS&filter=ITEM_SECTOR%3D%3DsSTL.OSE%26%26DELETED!%3Dn1&stop=now&start=1493935200000&ascending=true'
file_name = 'DownloadFile.xlsx'
sdv_file_name = 'DownloadFile.sdv'
while True:
try:
resp = requests.get(url)
with open(file_name, 'wb') as output:
output.write(resp.content)
break
except Exception as e:
print(str(e))
time.sleep(3)
wb = load_workbook(filename=file_name, read_only=True)
ws = wb['data']
ws.max_row = ws.max_column = None
with open(sdv_file_name, 'a') as output:
for row in ws.rows:
line = str(row[0].value)+';'+str(row[1].value)+';'+str(row[2].value)+';'+str(row[3].value)+';'+str(row[4].value)+';'+str(row[5].value)+'\n'
output.write(line)