从网页上自动下载CSV

时间:2020-09-09 23:26:02

标签: pandas dataframe csv xlsx xlsxwriter

我编写了一个脚本,该脚本从Web下载一些csv,然后将其保存到excel文件中。

问题在于,有时在给定日期生成的链接没有响应,因此循环跳过了这一天甚至几天,因此存在间隙,并且在给定时间段内数据不完整。

我试图编写一个函数来保存已经下载的函数,然后循环将只为丢失的函数进行迭代,但是我的想法完全用完了,没有更多线索来解决该问题。我对Pandas还是陌生的:)

这是我的代码:

import pandas as pd 
import datetime as dt 
import time 


def zamien(n):
    if int(n) < 10:
        return '0'+str(n)
    else: 
        return n
    
def link(d,m,y):
    return 'https://www.pse.pl/getcsv/-/export/csv/PL_GEN_WIATR/data_od/'+ str(y) + str(zamien(m)) + str(zamien(d)) +'/data_do/'+ str(y) + str(zamien(m)) + str(zamien(d+1))
time.sleep(5)
startdate = dt.datetime(2020,1,1)
teraz = dt.datetime(2020,3,31)
delta = dt.timedelta(days=2)
   
dane = pd.DataFrame()
while startdate <= teraz:
    print (startdate.day)
    print (link(startdate.day,startdate.month,startdate.year))
   
    try:
       df_test = pd.read_csv(link(startdate.day,startdate.month,startdate.year), encoding = "cp1250", sep=";")
       print(df_test.head())
       dane = dane.append(df_test, ignore_index=True)
       dane.to_csv(r'C:\Users\Bartek\Desktop\PSEP0.csv', sep=';')
    except:
        pass
    startdate += delta

dane.to_csv(r'C:\Users\Bartek\Desktop\PSEP0.csv', sep=';')

path = pd.read_csv(r'C:\Users\Bartek\Desktop\PSEP0.csv', encoding = "cp1250", sep=";")
#path = dane.append(df_test, ignore_index=True)
writer = pd.ExcelWriter(r'C:\Users\Bartek\Desktop\PSEP0.xlsx') 
path.to_excel(writer, 'Dane')
writer.save()
del path['Unnamed: 0']

0 个答案:

没有答案