从JSON文件中计算Python中的项目

时间:2016-03-10 03:21:22

标签: python json

我正在尝试搜索数据文件,例如Yelp.json。它在洛杉矶,波士顿,华盛顿都有业务。

我写了这个:

# Python 2

# read json
with open('updated_data.json') as facts_data:
    data = json.load(facts_data)

# return every unique locality along with how often it occurs
locality = []
unique_locality = []
# Load items into lists
for item in data:
   locality.append(data["payload"]["locality"])
   if data["payload"]["locality"] not in unique_locality:
       print unique_locality.append(data["payload"]["locality"])
# Loops over unique_locality and count from locality
print "Unique Locality Count:", unique_locality, locality.count(data["payload"]["locality"])

但我得到了“朴茨茅斯1号”的答案,这意味着它没有提供所有的城市,甚至可能没有提供所有的计数。我在这一部分的目标是搜索那个JSON文件并让它说“DC:10个企业,洛杉矶:20个企业,波士顿:2个企业”。每个有效载荷是一组关于单个业务的信息,“地点”只是城市。所以我希望它能找到有多少独特的城市,然后是每个城市有多少企业。所以一个有效载荷可以是la中的星巴克,另一个有效载荷可以是dc中的星巴克,另一个可以是la中的Chipotle。

JSON文件示例(JSONlite.com说它有效):

"payload": {
        "existence_full": 1,
        "geo_virtual": "[\"56.9459720|-2.1971226|20|within_50m|4\"]",
        "latitude": "56.945972",
        "locality": "Stonehaven",
        "_records_touched": "{\"crawl\":8,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
        "address": "The Lodge, Dunottar",
        "email": "dunnottarcastle@btconnect.com",
        "existence_ml": 0.5694238217658721,
        "domain_aggregate": "",
        "name": "Dunnottar Castle",
        "search_tags": ["Dunnottar Castle Aberdeenshire", "Dunotter Castle"],
        "admin_region": "Scotland",
        "existence": 1,
        "category_labels": [
            ["Landmarks", "Buildings and Structures"]
        ],
        "post_town": "Stonehaven",
        "region": "Kincardineshire",
        "review_count": "719",
        "geocode_level": "within_50m",
        "tel": "01569 762173",
        "placerank": 65,
        "longitude": "-2.197123",
        "placerank_ml": 37.27916073464469,
        "fax": "01330 860325",
        "category_ids_text_search": "",
        "website": "http://www.dunnottarcastle.co.uk",
        "status": "1",
        "geocode_confidence": "20",
        "postcode": "AB39 2TL",
        "category_ids": [108],
        "country": "gb",
        "_geocode_quality": "4",
        "uuid": "3867aaf3-12ab-434f-b12b-5d627b3359c3"
    },
    "payload": {
        "existence_full": 1,
        "geo_virtual": "[\"56.237480|-5.073578|20|within_50m|4\"]",
        "latitude": "56.237480",
        "locality": "Inveraray",
        "_records_touched": "{\"crawl\":11,\"lssi\":0,\"polygon_centroid\":0,\"geocoder\":0,\"user_submission\":0,\"tdc\":0,\"gov\":0}",
        "address": "Cherry Park",
        "email": "enquiries@inveraray-castle.com",
        "longitude": "-5.073578",
        "domain_aggregate": "",
        "name": "Inveraray Castle",
        "admin_region": "Scotland",
        "search_tags": ["Inveraray Castle Tea Room", "Inverary Castle"],
        "existence": 1,
        "category_labels": [
            ["Social", "Food and Dining", "Restaurants"]
        ],
        "region": "Argyll",
        "review_count": "532",
        "geocode_level": "within_50m",
        "tel": "01499 302203",
        "placerank": 67,
        "post_town": "Inveraray",
        "placerank_ml": 41.19978087352266,
        "fax": "01499 302421",
        "category_ids_text_search": "",
        "website": "http://www.inveraray-castle.com",
        "status": "1",
        "geocode_confidence": "20",
        "postcode": "PA32 8XE",
        "category_ids": [347],
        "country": "gb",
        "_geocode_quality": "4",
        "existence_ml": 0.7914881102847783,
        "uuid": "8278ab80-2cd1-4dbd-9685-0d0036b681eb"
    },

1 个答案:

答案 0 :(得分:0)

如果你的" json"语义就像是

{"payload":{ CONTENT_A }, "payload":{ CONTENT_B }, ..., "payload":{ CONTENT_LAST }}

它是一个有效的json 字符串,但在json.loads字符串之后,它将被评估为

{"payload":{ CONTENT_LAST }}

这就是为什么你最终得到一个城市和一个商业计数。

您可以通过检查JS eval字段来验证此在线json解析器http://json.parser.online.fr/上的此行为。

在这种情况下,预处理json字符串的一种方法是去除虚拟"有效负载"键并将内容字典直接包装在列表中。您将拥有以下格式的json字符串。

{[{CONTENT_A}, {CONTENT_B} ..., {CONTENT_LAST} ]}

假设你的json字符串现在是一个有效负载字典列表,你有json.loads(json_str)到数据

在迭代json有效负载时,沿途构建一个查找表。 这将自动处理重复的城市,因为同一城市的业务将被散列到同一个列表。

city_business_map = {}
for payload in data:
    city = payload['locality']
    business = payload['name']
    if city not in city_business_map:
        city_business_map[city] = []
    city_business_map[city].append(business)

然后,您可以通过

轻松呈现解决方案
for city, business_list in city_business_map.items():
     print city, len(business_list)

如果要计算每个城市中的唯一业务,请初始化要设置的值而不是列表。

如果这是一个过度杀伤,而不是初始化为列表或设置,只需将计数器与每个键相关联。