这就是问题所在。我正在编写一个程序,每5分钟从网站提取一次数据。该程序使用bs4解析网站并获得URL,然后将URL传递到Web浏览器中。所有这些都成功完成。
文件已压缩,因此每次程序运行时(即每5分钟一次),我还想解压缩文件并将其从下载到的文件夹中移出,然后将它们传递到新文件夹中,一直工作到我在不同部分进行了一些更改。现在它不起作用了,我认为问题出在第32行和第40行之间。
在第32行中,我从ercot页面(第16行)中获取标题,并使用.text使其成为标题,这是每个下载的文件在每次运行时保存的内容。第33行提取文本并将qoutes放到第34行。问题是每5分钟运行一次要解压缩的标题是不同的,所以我使用tt变量传递文件名来解压缩。
任何帮助将不胜感激。
from urllib.request import urlopen as u_req
from bs4 import BeautifulSoup as soup
from datetime import datetime
import webbrowser, os, time, bs4, schedule, openpyxl, zipfile, csv
my_url = 'http://mis.ercot.com/misapp/GetReports.do? reportTypeId=11485&reportTitle=LMPs%20by%20Electrical%20Bus&showHTMLView=&mimicKey/'
snooze = time.sleep(30)
batch_time = 0
def job():
#opening up connection, grabbing the page
uClient = u_req(my_url)
page_soup = soup(uClient, "html.parser")
#csv 5 minute data title, variable name is clean_csv_title
title = page_soup.findAll('tr')[3]
titles = (title.findAll('td')[0])
clean_csv_title = titles.text[-23:-15]
batch_time = titles.text[-14:-10]
#print(clean_csv_title)
#variable that contains the link for the first 5 minute data
first_csv = (page_soup.findAll('a')[0])
csv_str = str(first_csv).strip('<a href="/misdownload/servlets/mirDownload?mimic_duns=&doclookupId=')
csv_str_2 = csv_str.strip('">zip</a>')
complete_link = "http://mis.ercot.com/misdownload/servlets/mirDownload?mimic_duns=&doclookupId=" + csv_str_2
#opening link, timeout 30 seconds
webbrowser.open(complete_link, new=0, autoraise=True)
snooze
#take previously downloaded file, unzip, and put in holding folder
called unzipped files
os.chdir('C:\\Users\\Main\\Desktop\\ERCOT_Data\\Incoming ercot files')
t = titles.text
tt = str("'" + t + "'")
unzipped = open(tt, 'rb')
z = zipfile.ZipFile(unzipped)
for name in z.namelist():
outpath = 'C:\\Users\\Main\\Desktop\\ERCOT_Data\\Unzipped files'
z.extract(name, outpath)
unzipped.close()
uClient.close()
schedule.every().day.at("00:01").do(job)
schedule.every().day.at("00:06").do(job)
schedule.every().day.at("00:11").do(job)
schedule.every().day.at("00:16").do(job)
#......n
while True:
schedule.run_pending()
time.sleep(1)
这是它抛出的错误
Traceback (most recent call last):
File "C:\Users\Main\Desktop\ERCOT_Data\total.py", line 358, in <module>
schedule.run_pending()
File "C:\Users\Main\AppData\Local\Programs\Python\Python37\lib\site-packages\schedule\__init__.py", line 493, in run_pending
default_scheduler.run_pending()
File "C:\Users\Main\AppData\Local\Programs\Python\Python37\lib\site-packages\schedule\__init__.py", line 78, in run_pending
self._run_job(job)
File "C:\Users\Main\AppData\Local\Programs\Python\Python37\lib\site-packages\schedule\__init__.py", line 131, in _run_job
ret = job.run()
File "C:\Users\Main\AppData\Local\Programs\Python\Python37\lib\site-packages\schedule\__init__.py", line 411, in run
ret = self.job_func()
File "C:\Users\Main\Desktop\ERCOT_Data\total.py", line 35, in job
unzipped = open(tt, 'rb')
FileNotFoundError: [Errno 2] No such file or directory:
"'cdr.00011485.0000000000000000.20180817.144517347.
LMPSELECTBUSNP6787_20180817_144513_csv.zip'"
当我将文件名与即将到来的ercot文件中的文件进行比较时,它们是完全相同的。