在下载和读取美国教育部国家教育统计中心提供的csv文件时遇到了麻烦。以下是应该为可能有兴趣帮助我进行故障排除的人员运行的代码。
import requests, zipfile, io
# First example shows that the code can work. Works fine on years 2005
# and earlier.
url = 'https://nces.ed.gov/ipeds/datacenter/data/HD2005_Data_Stata.zip'
r_zip_file_2005 = requests.get(url, stream=True)
z_zip_file_2005 = zipfile.ZipFile(io.BytesIO(r_zip_file_2005.content))
z_zip_file_2005.extractall('.')
csv_2005_df = pd.read_csv('hd2005_data_stata.csv')
# Second example shows that something changed in the CSV files after
# 2005 (or seems to have changed).
url = 'https://nces.ed.gov/ipeds/datacenter/data/HD2006_Data_Stata.zip'
r_zip_file_2006 = requests.get(url, stream=True)
z_zip_file_2006 = zipfile.ZipFile(io.BytesIO(r_zip_file_2006.content))
z_zip_file_2006.extractall('.')
csv_2006_df = pd.read_csv('hd2006_data_stata.csv')
在2006年,Python进行了加薪:
UnicodeDecodeError: 'utf-8' codec can't decode byte 0x96 in position 18: invalid start byte
UnicodeDecodeError Traceback (most recent call last)
<ipython-input-26-b26a150e37ee> in <module>()
----> 1 csv_2006_df = pd.read_csv('hd2006_data_stata.csv')
有关如何克服这一问题的任何提示?
答案 0 :(得分:0)
只用了7个月...想出了我的答案。不是火箭科学。
csv_2006_df = pd.read_csv('hd2006_data_stata.csv',
encoding='ISO-8859-1')