我编写了一个脚本,该脚本从Web下载一些csv,然后将其保存到excel文件中。
问题在于,有时在给定日期生成的链接没有响应,因此循环跳过了这一天甚至几天,因此存在间隙,并且在给定时间段内数据不完整。
我试图编写一个函数来保存已经下载的函数,然后循环将只为丢失的函数进行迭代,但是我的想法完全用完了,没有更多线索来解决该问题。我对Pandas还是陌生的:)
这是我的代码:
import pandas as pd
import datetime as dt
import time
def zamien(n):
if int(n) < 10:
return '0'+str(n)
else:
return n
def link(d,m,y):
return 'https://www.pse.pl/getcsv/-/export/csv/PL_GEN_WIATR/data_od/'+ str(y) + str(zamien(m)) + str(zamien(d)) +'/data_do/'+ str(y) + str(zamien(m)) + str(zamien(d+1))
time.sleep(5)
startdate = dt.datetime(2020,1,1)
teraz = dt.datetime(2020,3,31)
delta = dt.timedelta(days=2)
dane = pd.DataFrame()
while startdate <= teraz:
print (startdate.day)
print (link(startdate.day,startdate.month,startdate.year))
try:
df_test = pd.read_csv(link(startdate.day,startdate.month,startdate.year), encoding = "cp1250", sep=";")
print(df_test.head())
dane = dane.append(df_test, ignore_index=True)
dane.to_csv(r'C:\Users\Bartek\Desktop\PSEP0.csv', sep=';')
except:
pass
startdate += delta
dane.to_csv(r'C:\Users\Bartek\Desktop\PSEP0.csv', sep=';')
path = pd.read_csv(r'C:\Users\Bartek\Desktop\PSEP0.csv', encoding = "cp1250", sep=";")
#path = dane.append(df_test, ignore_index=True)
writer = pd.ExcelWriter(r'C:\Users\Bartek\Desktop\PSEP0.xlsx')
path.to_excel(writer, 'Dane')
writer.save()
del path['Unnamed: 0']