在Python中以特定格式从嵌套JSON对象中提取数据

时间:2017-07-17 10:23:43

标签: python json csv nested

我有一个包含多个嵌套JSON对象的数据集,如下所示:

{
"coordinates": null,
"acoustic_features": {
    "instrumentalness": "0.00479",
    "liveness": "0.18",
    "speechiness": "0.0294",
    "danceability": "0.634",
    "valence": "0.342",
    "loudness": "-8.345",
    "tempo": "125.044",
    "acousticness": "0.00035",
    "energy": "0.697",
    "mode": "1",
    "key": "6"
},
"artist_id": "b2980c722a1ace7a30303718ce5491d8",
"place": null,
"geo": null,
"tweet_lang": "en",
"source": "Share.Radionomy.com",
"track_title": "8eeZ",
"track_id": "cd52b3e5b51da29e5893dba82a418a4b",
"artist_name": "Dominion",
"entities": {
    "hashtags": [{
        "text": "nowplaying",
        "indices": [0, 11]
    }, {
        "text": "goth",
        "indices": [51, 56]
    }, {
        "text": "deathrock",
        "indices": [57, 67]
    }, {
        "text": "postpunk",
        "indices": [68, 77]
    }],
    "symbols": [],
    "user_mentions": [],
    "urls": [{
        "indices": [28, 50],
        "expanded_url": "cathedral13.com/blog13",
        "display_url": "cathedral13.com/blog13",
        "url": "t.co/Tatf4hEVkv"
    }]
},
"created_at": "2014-01-01 05:54:21",
"text": "#nowplaying Dominion - 8eeZ Tatf4hEVkv #goth #deathrock #postpunk",
"user": {
    "location": "middle of nowhere",
    "lang": "en",
    "time_zone": "Central Time (US & Canada)",
    "name": "Cathedral 13",
    "entities": null,
    "id": 81496937,
    "description": "I\u2019m a music junkie who is currently responsible for 
Cathedral 13 internet radio (goth, deathrock, post-punk)which has been online 
since 06/20/02."
},
"id": 418243774842929150
}

我想输出文件看起来格式为:

user_id1 - track_id - hashtag1
user_id1 - track_id - hashtag2
user_id1 - track_id - hashtag3
user_id2 - track_id - hashtag1
user_id2 - track_id - hashtag2
....

对于此示例,输出应为:

81496937  cd52b3e5b51da29e5893dba82a418a4b  nowplaying
81496937  cd52b3e5b51da29e5893dba82a418a4b  goth
81496937  cd52b3e5b51da29e5893dba82a418a4b  deathrock
81496937  cd52b3e5b51da29e5893dba82a418a4b  postpunk

我已经编写了以下代码来执行此操作:

import json
import csv
with open('final_dataset_json.json') as data_file:
        data = json.load(data_file)

uth = open('uth.csv','wb')

cvwriter = csv.writer(uth)

for entry in data:
    text_list = [hashtag['text'] for hashtag in entry['entities']['hashtags']]
    for line in text_list:
        csvwriter.writerow([entry['user']['id'],entry['track_id'],line.strip()+'\n')

uth.close()

如何实现给定的输出?

2 个答案:

答案 0 :(得分:1)

在csvwriter中,如果要写入新行,则必须在列表中发送所有列数据。

我希望如果你换掉这条线就足够了。

    csvwriter.writerow([entry['user']['id'],entry['track_id'],line.strip()])

答案 1 :(得分:1)

简单的字典查找(json有一个模块)

import json
d = json.loads(json_str)
for ht in d['entities']['hashtags']:
    print '{} - {} - {}'.format(d['user']['id'], d['artist_id'], ht['text'])

Yeilds:

81496937 - b2980c722a1ace7a30303718ce5491d8 - nowplaying
81496937 - b2980c722a1ace7a30303718ce5491d8 - goth
81496937 - b2980c722a1ace7a30303718ce5491d8 - deathrock
81496937 - b2980c722a1ace7a30303718ce5491d8 - postpunk