我有一些代码可以下载一些压缩的csv文件,将其解压缩,然后将数据连接到单个数据框中。问题是我得到了错误
import pandas as pd
import requests
from io import BytesIO
from zipfile import ZipFile
from bs4 import BeautifulSoup
def findZipLinks(url):
r = requests.get(url)
bs = BeautifulSoup(r.content, features="html.parser")
links = [agecaredata_url + a.get('data-link') for a in bs.findAll('a', {"class": "downloadhrefp_lt_WebPartZone6_znMC_pageplaceholder_p_lt_WebPartZone2_ZoneA_znPublicationFooterItem_znPublicationFooterItem_zone_Stacker_MultiColumns u-dtb u-w100p u-bgc-primary u-c-fff c-publication__download u-mb-gutter0p25x"}) if "zip" in a.get("data-link")]
return links
exits = findZipLinks('https://www.gen-agedcaredata.gov.au/Resources/Access-data/2018/June/GEN-data-People-leaving-aged-care')
dfs = []
for exit_url in exits:
r = requests.get(exit_url)
zipfile = ZipFile(BytesIO(r.content))
dfs.append(pd.read_csv(zipfile.open(zipfile.namelist()[0]), dtype=str))
pd.concat(df for df in dfs).reset_index(drop=True)
问题是我在附加行上收到错误UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 2: invalid start byte
。我尝试调用.decode('utf-8')和.decode('windows-1252'),但收到类似的错误。谁能帮助我找出问题所在?
答案 0 :(得分:0)
读取文件时,将读取模式指定为wb
zipfile.open(zipfile.namelist()[0], 'wb')