Question

我是python的新手。我想从https://services.tcpl.ca/cor/public/gdsr/GdsrNGTLImperial20190703.htm中提取数据，但是URL中的日期每天都在变化。我可以在.csv中获取所有URL，但是我不知道如何检索文件并将其写入.csv，因此其格式清晰。

我能够从网站检索数据并将其写入.csv，如下所示，但是我不知道如何扩展代码以循环.csv写入和检索URL部分。

from urllib.request import urlopen  
from bs4 import BeautifulSoup  

url = "https://services.tcpl.ca/cor/public/gdsr/GdsrNGTLImperial20190703.htm"  
try:
    page = urlopen(url)
except:
    print("Error opening the URL")

soup = BeautifulSoup(page, 'html.parser')  
soup2 = soup.text

with open('scraped_text.csv', 'w') as file:   
    file.write(soup2)

理想情况下，我将能够汇总以.csv格式组织的365天数据进行研究。

Answer 1

由于html结构保持不变，并且只有url发生更改，因此您可以仅将日期用作url中的变量

# here date is a variable or a function to set the date
url = "https://services.tcpl.ca/cor/public/gdsr/GdsrNGTLImperial" + date + ".htm"

如何从一年中每天更改的.htm URL抓取数据并将数据写入.csv

1 个答案: