如何使用pd.read_csv()下载嵌入为图像的文件?

时间:2020-05-25 17:40:00

标签: python pandas

我想知道如何将pd.read_csv()中的“接受并下载”按钮直接下载并读入pd中?通常,我只能复制下载链接并将其粘贴,但是在这种情况下,它会将其识别为图像,因此无法获取下载链接。

链接:https://geodash.vpd.ca/opendata/

2 个答案:

答案 0 :(得分:2)

即使回答了这个问题,但让我添加一些其他成分,使它成为一个动态的。鸣谢:@JasonGroulx

在这里,我们假设将要执行的操作是BeautifulSoup,然后执行操作,然后读取数据。

from bs4 import BeautifulSoup
import requests, io, urllib.request
import zipfile
html = urllib.request.urlopen('https://geodash.vpd.ca/opendata/')
soup = BeautifulSoup(html)
action = soup.find('form').get('action')
resp = requests.get(action)
z = zipfile.ZipFile(io.BytesIO(resp.content))
df = pd.read_csv(z.open(os.path.basename(action).replace('.zip','.csv')))

答案 1 :(得分:1)

如果使用开发工具检查按钮,您会看到表单指向此URL

<form action="http://geodash.vpd.ca/opendata/crimedata_download/crimedata_csv_all_years.zip" method="get">

所以您可以执行以下操作

import requests, zipfile, io
r = requests.get('http://geodash.vpd.ca/opendata/crimedata_download/crimedata_csv_all_years.zip')
z = zipfile.ZipFile(io.BytesIO(r.content))
df = pd.read_csv(z.open('crimedata_csv_all_years.csv'))

呼叫df.head()将会输出

                         TYPE  YEAR  MONTH  DAY  HOUR  MINUTE    HUNDRED_BLOCK NEIGHBOURHOOD              X             Y
0  Break and Enter Commercial  2012     12   14     8      52              NaN      Oakridge  491285.000000  5.453433e+06
1  Break and Enter Commercial  2019      3    7     2       6    10XX SITKA SQ      Fairview  490612.964805  5.457110e+06
2  Break and Enter Commercial  2019      8   27     4      12  10XX ALBERNI ST      West End  491007.779775  5.459174e+06
3  Break and Enter Commercial  2014      8    8     5      13  10XX ALBERNI ST      West End  491015.943352  5.459166e+06
4  Break and Enter Commercial  2005     11   14     3       9  10XX ALBERNI ST      West End  491021.385727  5.459161e+06