所以我在这个帖子中找到了我的问题的大部分解决方案:Use Python to select rows with a particular range of values in one column
但是在实施代码时,我想出了一个我无法弄清楚的错误。我试图仅从citi自行车数据中提取订阅者的数据行(信息在这里:http://www.citibikenyc.com/system-data)
所以这是代码:
import csv
with open("E:/Dropbox/PPS/CitiBikeData/2014_Data.csv") as input, open("E:/Dropbox/PPS/CitiBikeData/subscribers.csv", "w") as output:
reader = csv.DictReader(input, dialect="excel-tab")
fieldnames = reader.fieldnames
writer_output = csv.DictWriter(output, fieldnames, dialect="excel-tab")
writer_output.writeheader()
for row in reader:
if int(row['gender']) > 0:
writer_output.writerow(row)
这是我得到的错误:
C:\Python34\python.exe E:/Dropbox/PPS/CitiBikeData/csvfilter_2.py
Traceback (most recent call last):
File "E:/Dropbox/PPS/CitiBikeData/csvfilter_2.py", line 9, in <module>
if int(row['gender']) > 0:
KeyError: 'gender'
Process finished with exit code 1
我理解KeyError是什么(通过阅读此https://wiki.python.org/moin/KeyError),但我无法弄清楚为什么我会收到错误或如何解决错误。
答案 0 :(得分:3)
您下载的数据是非制表符分隔。您使用错误的CSV方言打开它。
删除dialect
参数,默认(逗号分隔)对于格式是合适的:
>>> import csv
>>> f = open("/tmp/2013-07 - Citi Bike trip data.csv")
>>> reader = csv.DictReader(f)
>>> next(reader)
{'bikeid': '16950', 'tripduration': '634', 'end station longitude': '-73.98165557', 'stoptime': '2013-07-01 00:10:34', 'end station name': '1 Ave & E 15 St', 'gender': '0', 'start station name': 'E 47 St & 2 Ave', 'start station longitude': '-73.97032517', 'start station id': '164', 'start station latitude': '40.75323098', 'end station id': '504', 'starttime': '2013-07-01 00:00:00', 'end station latitude': '40.73221853', 'birth year': '\\N', 'usertype': 'Customer'}
>>> _['gender']
'0'
由于gender
列是'0'
或'1'
或'2'
,因此在这种情况下,您只需测试不等于'0'
并保存自己int()
来电:
writer_output.writerows(row for row in reader if row['gender'] != '0')
这使用生成器表达式将所有已过滤的行传递给DictWriter.writerows()
(复数)。