我正在尝试搜索数据文件,例如Yelp.json。它在洛杉矶,波士顿,华盛顿都有业务。
我写了这个:
# Python 2
# read json
with open('updated_data.json') as facts_data:
data = json.load(facts_data)
# return every unique locality along with how often it occurs
locality = []
unique_locality = []
# Load items into lists
for item in data:
locality.append(data["payload"]["locality"])
if data["payload"]["locality"] not in unique_locality:
print unique_locality.append(data["payload"]["locality"])
# Loops over unique_locality and count from locality
print "Unique Locality Count:", unique_locality, locality.count(data["payload"]["locality"])
但我得到了“朴茨茅斯1号”的答案,这意味着它没有提供所有的城市,甚至可能没有提供所有的计数。我在这一部分的目标是搜索那个JSON文件并让它说“DC:10个企业,洛杉矶:20个企业,波士顿:2个企业”。每个有效载荷是一组关于单个业务的信息,“地点”只是城市。所以我希望它能找到有多少独特的城市,然后是每个城市有多少企业。所以一个有效载荷可以是la中的星巴克,另一个有效载荷可以是dc中的星巴克,另一个可以是la中的Chipotle。
JSON文件示例(JSONlite.com说它有效):
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.9459720|-2.1971226|20|within_50m|4\"]",
"latitude": "56.945972",
"locality": "Stonehaven",
"_records_touched": "{\"crawl\":8,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "The Lodge, Dunottar",
"email": "dunnottarcastle@btconnect.com",
"existence_ml": 0.5694238217658721,
"domain_aggregate": "",
"name": "Dunnottar Castle",
"search_tags": ["Dunnottar Castle Aberdeenshire", "Dunotter Castle"],
"admin_region": "Scotland",
"existence": 1,
"category_labels": [
["Landmarks", "Buildings and Structures"]
],
"post_town": "Stonehaven",
"region": "Kincardineshire",
"review_count": "719",
"geocode_level": "within_50m",
"tel": "01569 762173",
"placerank": 65,
"longitude": "-2.197123",
"placerank_ml": 37.27916073464469,
"fax": "01330 860325",
"category_ids_text_search": "",
"website": "http://www.dunnottarcastle.co.uk",
"status": "1",
"geocode_confidence": "20",
"postcode": "AB39 2TL",
"category_ids": [108],
"country": "gb",
"_geocode_quality": "4",
"uuid": "3867aaf3-12ab-434f-b12b-5d627b3359c3"
},
"payload": {
"existence_full": 1,
"geo_virtual": "[\"56.237480|-5.073578|20|within_50m|4\"]",
"latitude": "56.237480",
"locality": "Inveraray",
"_records_touched": "{\"crawl\":11,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
"address": "Cherry Park",
"email": "enquiries@inveraray-castle.com",
"longitude": "-5.073578",
"domain_aggregate": "",
"name": "Inveraray Castle",
"admin_region": "Scotland",
"search_tags": ["Inveraray Castle Tea Room", "Inverary Castle"],
"existence": 1,
"category_labels": [
["Social", "Food and Dining", "Restaurants"]
],
"region": "Argyll",
"review_count": "532",
"geocode_level": "within_50m",
"tel": "01499 302203",
"placerank": 67,
"post_town": "Inveraray",
"placerank_ml": 41.19978087352266,
"fax": "01499 302421",
"category_ids_text_search": "",
"website": "http://www.inveraray-castle.com",
"status": "1",
"geocode_confidence": "20",
"postcode": "PA32 8XE",
"category_ids": [347],
"country": "gb",
"_geocode_quality": "4",
"existence_ml": 0.7914881102847783,
"uuid": "8278ab80-2cd1-4dbd-9685-0d0036b681eb"
},
答案 0 :(得分:0)
如果你的" json"语义就像是
{"payload":{ CONTENT_A }, "payload":{ CONTENT_B }, ..., "payload":{ CONTENT_LAST }}
它是一个有效的json 字符串,但在json.loads字符串之后,它将被评估为
{"payload":{ CONTENT_LAST }}
这就是为什么你最终得到一个城市和一个商业计数。
您可以通过检查JS eval字段来验证此在线json解析器http://json.parser.online.fr/上的此行为。
在这种情况下,预处理json字符串的一种方法是去除虚拟"有效负载"键并将内容字典直接包装在列表中。您将拥有以下格式的json字符串。
{[{CONTENT_A}, {CONTENT_B} ..., {CONTENT_LAST} ]}
假设你的json字符串现在是一个有效负载字典列表,你有json.loads(json_str)到数据。
在迭代json有效负载时,沿途构建一个查找表。 这将自动处理重复的城市,因为同一城市的业务将被散列到同一个列表。
city_business_map = {}
for payload in data:
city = payload['locality']
business = payload['name']
if city not in city_business_map:
city_business_map[city] = []
city_business_map[city].append(business)
然后,您可以通过
轻松呈现解决方案for city, business_list in city_business_map.items():
print city, len(business_list)
如果要计算每个城市中的唯一业务,请初始化要设置的值而不是列表。
如果这是一个过度杀伤,而不是初始化为列表或设置,只需将计数器与每个键相关联。