Python JSON数据选择

时间:2015-06-01 11:10:52

标签: python json

我试图从json数据集中提取一些餐馆信息,这里有两个样本,一个是餐馆,一个不是

{"business_id": "vcNAWiLM4dR7D2nwwJ7nCA", "full_address": "4840 E Indian School Rd\nSte 101\nPhoenix, AZ 85018", "hours": {"Tuesday": {"close": "17:00", "open": "08:00"}, "Friday": {"close": "17:00", "open": "08:00"}, "Monday": {"close": "17:00", "open": "08:00"}, "Wednesday": {"close": "17:00", "open": "08:00"}, "Thursday": {"close": "17:00", "open": "08:00"}}, "open": true, "categories": ["Doctors", "Health & Medical"], "city": "Phoenix", "review_count": 9, "name": "Eric Goldberg, MD", "neighborhoods": [], "longitude": -111.98375799999999, "state": "AZ", "stars": 3.5, "latitude": 33.499313000000001, "attributes": {"By Appointment Only": true}, "type": "business"}
{"business_id": "mVHrayjG3uZ_RLHkLj-AMg", "full_address": "414 Hawkins Ave\nBraddock, PA 15104", "hours": {"Tuesday": {"close": "19:00", "open": "10:00"}, "Friday": {"close": "20:00", "open": "10:00"}, "Saturday": {"close": "16:00", "open": "10:00"}, "Thursday": {"close": "19:00", "open": "10:00"}, "Wednesday": {"close": "19:00", "open": "10:00"}}, "open": true, "categories": ["Bars", "American (New)", "Nightlife", "Lounges", "Restaurants"], "city": "Braddock", "review_count": 11, "name": "Emil's Lounge", "neighborhoods": [], "longitude": -79.866350699999998, "state": "PA", "stars": 4.5, "latitude": 40.408735, "attributes": {"Alcohol": "full_bar", "Noise Level": "average", "Has TV": true, "Attire": "casual", "Ambience": {"romantic": false, "intimate": false, "classy": false, "hipster": false, "divey": false, "touristy": false, "trendy": false, "upscale": false, "casual": false}, "Good for Kids": true, "Price Range": 1, "Good For Dancing": false, "Delivery": false, "Coat Check": false, "Smoking": "no", "Accepts Credit Cards": true, "Take-out": true, "Happy Hour": false, "Outdoor Seating": false, "Takes Reservations": false, "Waiter Service": true, "Wi-Fi": "no", "Caters": true, "Good For": {"dessert": false, "latenight": false, "lunch": false, "dinner": false, "breakfast": false, "brunch": false}, "Parking": {"garage": false, "street": false, "validated": false, "lot": false, "valet": false}, "Music": {"dj": false}, "Good For Groups": true}, "type": "business"}

当我运行它时,即使类别"餐厅"在第一位数据中不存在,有人可以解释为什么吗?

for line in f:
    jd = json.loads(line)
    if jd['categories'] == 'Food' or 'Restaurants':
        print (jd['name'], jd['business_id'], jd['latitude'], jd['longitude'])

以更易读的格式提供JSON数据:

{
    "business_id": "vcNAWiLM4dR7D2nwwJ7nCA", 
    "full_address": "4840 E Indian School Rd\nSte 101\nPhoenix, AZ 85018", 
    "hours": {
        "Thursday": {
            "close": "17:00", 
            "open": "08:00"
        }, 
        "Tuesday": {
            "close": "17:00", 
            "open": "08:00"
        }, 
        "Friday": {
            "close": "17:00", 
            "open": "08:00"
        }, 
        "Wednesday": {
            "close": "17:00", 
            "open": "08:00"
        }, 
        "Monday": {
            "close": "17:00", 
            "open": "08:00"
        }
    }, 
    "open": true, 
    "categories": [
        "Doctors", 
        "Health & Medical"
    ], 
    "city": "Phoenix", 
    "review_count": 9, 
    "name": "Eric Goldberg, MD", 
    "neighborhoods": [], 
    "longitude": -111.98375799999999, 
    "state": "AZ", 
    "stars": 3.5, 
    "latitude": 33.499313000000001, 
    "attributes": {
        "By Appointment Only": true
    }, 
    "type": "business"
}
{
    "business_id": "mVHrayjG3uZ_RLHkLj-AMg", 
    "full_address": "414 Hawkins Ave\nBraddock, PA 15104", 
    "hours": {
        "Tuesday": {
            "close": "19:00", 
            "open": "10:00"
        }, 
        "Friday": {
            "close": "20:00", 
            "open": "10:00"
        }, 
        "Saturday": {
            "close": "16:00", 
            "open": "10:00"
        }, 
        "Thursday": {
            "close": "19:00", 
            "open": "10:00"
        }, 
        "Wednesday": {
            "close": "19:00", 
            "open": "10:00"
        }
    }, 
    "open": true, 
    "categories": [
        "Bars", 
        "American (New)", 
        "Nightlife", 
        "Lounges", 
        "Restaurants"
    ], 
    "city": "Braddock", 
    "review_count": 11, 
    "name": "Emil's Lounge", 
    "neighborhoods": [], 
    "longitude": -79.866350699999998, 
    "state": "PA", 
    "stars": 4.5, 
    "latitude": 40.408735, 
    "attributes": {
        "Alcohol": "full_bar", 
        "Noise Level": "average", 
        "Music": {
            "dj": false
        }, 
        "Attire": "casual", 
        "Ambience": {
            "touristy": false, 
            "hipster": false, 
            "romantic": false, 
            "divey": false, 
            "intimate": false, 
            "trendy": false, 
            "upscale": false, 
            "classy": false, 
            "casual": false
        }, 
        "Good for Kids": true, 
        "Price Range": 1, 
        "Good For Dancing": false, 
        "Delivery": false, 
        "Coat Check": false, 
        "Smoking": "no", 
        "Accepts Credit Cards": true, 
        "Take-out": true, 
        "Happy Hour": false, 
        "Outdoor Seating": false, 
        "Takes Reservations": false, 
        "Waiter Service": true, 
        "Wi-Fi": "no", 
        "Caters": true, 
        "Good For": {
            "dessert": false, 
            "latenight": false, 
            "lunch": false, 
            "dinner": false, 
            "brunch": false, 
            "breakfast": false
        }, 
        "Parking": {
            "garage": false, 
            "street": false, 
            "validated": false, 
            "lot": false, 
            "valet": false
        }, 
        "Has TV": true, 
        "Good For Groups": true
    }, 
    "type": "business"
}

3 个答案:

答案 0 :(得分:6)

此:

if jd['categories'] == 'Food' or 'Restaurants':

被解析为:

if (jd['categories'] == 'Food') or 'Restaurants':

由于'Restaurant'是非空字符串,因此它在布尔上下文中始终具有true值,因此您的测试确实是:

if (jd['categories'] == 'Food') or True:

这是一个明显的同义反复。

你想:

if jd['categories'] == 'Food' or jd['categories'] == 'Restaurants':

或更简单:

if jd['categories'] in  ('Food', 'Restaurants'):

现在在你的情况下(BTW请花时间在下次发布一个已清理,简化和格式化的 json片段),jd['categories']是一个列表,所以你无法比较它string - 你可以,但它总是eval为False - 也不会使用上面的包含测试,你必须检查包含js['categories']'Food'之内的'Restaurants'

if 'Food' in jd['categories'] or 'Restaurants' in jd['categories']:

答案 1 :(得分:1)

从OP中的数据测试这一点并不容易,但您需要将测试更改为以下内容:

#Get category list from current dict
cat = jd['categories']
if 'Food' in cat or 'Restaurants' in cat:
    print(jd['name'], jd['business_id'], jd['latitude'], jd['longitude'])

答案 2 :(得分:0)

第3行似乎没有正确优化

for line in f:
    jd = json.loads(line)
    if jd['categories'] in ('Food', 'Restaurants'):
        print (jd['name'], jd['business_id'], jd['latitude'], jd['longitude'])

您也可以考虑编码或转义来自json.loads()函数的字符串,因为以这种方式比较字符串会更合适。