Question

嘿，我是新手使用JSON文件我有一个样本JSON这样的文件（twitter数据抓取）这里每个[]包含多个JSON对象，我想从这些对象中提取文本，问题是json.load不能处理多个JSON数组结构([][][]) 在下面的示例示例中，第一个[]包含三个JSON对象，第二个包含两个JSON对象

[{
            "created_at": "2014-12-07 02:17:18", 
            "favorite_count": 5, 
            "id_str": "541416129567019008", 
            "in_reply_to_screen_name": "panellington", 
            "retweet_count": 15, 
            "retweeted": false, 
            "text": "minutes ago. #berkeley #BlackLivesMatter #EricGarner #LAPD"
        },
        {
            "created_at": "2014-12-04 19:21:13", 
            "favorite_count": 13, 
            "id_str": "540586640595369984", 
            "in_reply_to_screen_name": null, 
            "retweet_count": 38, 
            "retweeted": false, 
            "text": "#BlackLivesMatter"
        },
        {
            "created_at": "2014-12-13 00:50:27", 
            "favorite_count": 1, 
            "id_str": "543568596299808768", 
            "in_reply_to_screen_name": null, 
            "retweet_count": 0, 
            "retweeted": false, 
            "text": "MLK Riot is language of unheard #Ferguson #ICantBreathe #BlackLivesMatter"
        }]

[{
            "created_at": "2015-04-28 13:21:35", 
            "favorite_count": 0, 
            "id_str": "593042377658519552", 
            "in_reply_to_screen_name": null, 
            "retweet_count": 5, 
            "retweeted": false, 
            "text": "RT @fsmith827: A lot of folks speaking against civil unrest have been willfully blind, willfully silent @ #BlackLivesMatter &amp; #PoliceBrutal\u2026", 
        },
        {
            "created_at": "2014-12-07 03:17:27", 
            "favorite_count": 0, 
            "id_str": "541431264897937408", 
            "in_reply_to_screen_name": null, 
            "retweet_count": 456, 
            "retweeted": false, 
            "text": "RT @thecrisismag: #ICantBreathe  Protesters in Paris march in solidarity with #EricGarner and #MikeBrown  #BlackLivesMatter #GrandJury" 

}]

我希望将其作为JSON对象阅读并希望对此进行处理（例如：data[0]['text']）

问题是我的文件包含多个JSON数组对象[一些随机数量的JSON对象]，[一些随机数量的JSON对象]等等。

    with open('tweets.json') as json_data:
        d = json.load(json_data)

由于这个[][][] ...结构
，
json.load无法正常工作

    ## error raised 
    Error: 
        raise JSONDecodeError("Extra data", s, end)

    JSONDecodeError: Extra data

Answer 1

也许我不清楚你在这里要做什么。但看起来你只想迭代你的json字符串列表。像这样：

In [1]: import json

In [2]: json_data = ['{ "created_at": "2014-12-07 02:17:18", "favorite_count": 5, "id_str": "541416129567019008", "in_reply_to_screen_name": "panellington", "retweet
   ...: _count": 15, "retweeted": false, "text": "minutes ago. #berkeley #BlackLivesMatter #EricGarner #LAPD" }', '{ "created_at": "2014-12-04 19:21:13", "favorite_c
   ...: ount": 13, "id_str": "540586640595369984", "in_reply_to_screen_name": null, "retweet_count": 38, "retweeted": false, "text": "#BlackLivesMatter" }', '{ "crea
   ...: ted_at": "2014-12-13 00:50:27", "favorite_count": 1, "id_str": "543568596299808768", "in_reply_to_screen_name": null, "retweet_count": 0, "retweeted": false,
   ...:  "text": "MLK Riot is language of unheard #Ferguson #ICantBreathe #BlackLivesMatter" }']

In [3]: for tweet in json_data:
   ...:     print(json.loads(tweet))
   ...:
{'created_at': '2014-12-07 02:17:18', 'favorite_count': 5, 'id_str': '541416129567019008', 'in_reply_to_screen_name': 'panellington', 'retweet_count': 15, 'retweeted': False, 'text': 'minutes ago. #berkeley #BlackLivesMatter #EricGarner #LAPD'}
{'created_at': '2014-12-04 19:21:13', 'favorite_count': 13, 'id_str': '540586640595369984', 'in_reply_to_screen_name': None, 'retweet_count': 38, 'retweeted': False, 'text': '#BlackLivesMatter'}
{'created_at': '2014-12-13 00:50:27', 'favorite_count': 1, 'id_str': '543568596299808768', 'in_reply_to_screen_name': None, 'retweet_count': 0, 'retweeted': False, 'text': 'MLK Riot is language of unheard #Ferguson #ICantBreathe #BlackLivesMatter'}

读取包含多个对象的json文件python

1 个答案: