我跟随Wes McKinney" Python for Data Analysis"中的示例。
在第2章中,我们被要求计算每个时区出现在“tz'”中的次数。位置,某些条目没有“tz'”。
麦金尼" America / New_York"出现在1251(前10/3440行中有2个,正如你在下面看到的那样),而我的出现是1.试图找出它显示为什么' 1'?
我正在使用Python 2.7,在Enthought的文本中安装了McKinney的指令(epd-7.3-1-win-x86_64.msi)。数据来自https://github.com/Canuckish/pydata-book/tree/master/ch02。如果你不能从书的标题中说出我是Python的新手,那么请提供如何获取我没有提供的任何信息的说明。
import json
path = 'usagov_bitly_data2012-03-16-1331923249.txt'
open(path).readline()
records = [json.loads(line) for line in open(path)]
records[0]
records[1]
print records[0]['tz']
此处的最后一行将显示' America / New_York',记录的模拟[1]显示' America / Denver'
#count unique time zones rating movies
#NOTE: NOT every JSON entry has a tz, so first line won't work
time_zones = [rec['tz'] for rec in records]
time_zones = [rec['tz'] for rec in records if 'tz' in rec]
time_zones[:10]
显示前十个时区条目,其中8-10为空白......
#counting using a dict to store counts
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1
return counts
counts = get_counts(time_zones)
counts['America/New_York']
这= 1,但应该是1251
len(time_zones)
这= 3440,因为它应该
答案 0 :(得分:0)
'America/New_York'
时区在输入中出现1251
次:
import json
from collections import Counter
with open(path) as file:
c = Counter(json.loads(line).get('tz') for line in file)
print(c['America/New_York']) # -> 1251
目前尚不清楚为什么您的代码的计数为1
。也许代码缩进不正确:
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else: #XXX wrong indentation
counts[x] = 1 # it is run after the loop if there is no `break`
return counts
请参阅Why does python use 'else' after for and while loops?
正确的缩进应该是:
def get_counts(sequence):
counts = {}
for x in sequence:
if x in counts:
counts[x] += 1
else:
counts[x] = 1 # it is run every iteration if x not in counts
return counts
检查您是否混合使用空格和制表符进行缩进,使用python -tt
运行您的脚本以查找。