我想知道如何将pd.read_csv()中的“接受并下载”按钮直接下载并读入pd中?通常,我只能复制下载链接并将其粘贴,但是在这种情况下,它会将其识别为图像,因此无法获取下载链接。
答案 0 :(得分:2)
即使回答了这个问题,但让我添加一些其他成分,使它成为一个动态的。鸣谢:@JasonGroulx
在这里,我们假设将要执行的操作是BeautifulSoup
,然后执行操作,然后读取数据。
from bs4 import BeautifulSoup
import requests, io, urllib.request
import zipfile
html = urllib.request.urlopen('https://geodash.vpd.ca/opendata/')
soup = BeautifulSoup(html)
action = soup.find('form').get('action')
resp = requests.get(action)
z = zipfile.ZipFile(io.BytesIO(resp.content))
df = pd.read_csv(z.open(os.path.basename(action).replace('.zip','.csv')))
答案 1 :(得分:1)
如果使用开发工具检查按钮,您会看到表单指向此URL
<form action="http://geodash.vpd.ca/opendata/crimedata_download/crimedata_csv_all_years.zip" method="get">
所以您可以执行以下操作
import requests, zipfile, io
r = requests.get('http://geodash.vpd.ca/opendata/crimedata_download/crimedata_csv_all_years.zip')
z = zipfile.ZipFile(io.BytesIO(r.content))
df = pd.read_csv(z.open('crimedata_csv_all_years.csv'))
呼叫df.head()
将会输出
TYPE YEAR MONTH DAY HOUR MINUTE HUNDRED_BLOCK NEIGHBOURHOOD X Y
0 Break and Enter Commercial 2012 12 14 8 52 NaN Oakridge 491285.000000 5.453433e+06
1 Break and Enter Commercial 2019 3 7 2 6 10XX SITKA SQ Fairview 490612.964805 5.457110e+06
2 Break and Enter Commercial 2019 8 27 4 12 10XX ALBERNI ST West End 491007.779775 5.459174e+06
3 Break and Enter Commercial 2014 8 8 5 13 10XX ALBERNI ST West End 491015.943352 5.459166e+06
4 Break and Enter Commercial 2005 11 14 3 9 10XX ALBERNI ST West End 491021.385727 5.459161e+06