Python:将字典转换为pandas数据帧

时间:2018-03-08 12:21:28

标签: json python-3.x pandas

我从Pocket API中获得了一些数据,而得到的名为 list 的JSON中有一些嵌套的JSON。下面的示例

{'complete': 1,
 'error': None,
 'list': {'1992211110': {'authors': {'8683682': {'author_id': '8683682',
     'item_id': '1992211110',
     'name': 'Robert Kuttner',
     'url': 'http://www.nybooks.com/contributors/robert-kuttner/'}},
   'excerpt': 'What a splendid era this was going to be, with one remaining superpower spreading capitalism and liberal democracy around the world. Instead, democracy and capitalism seem increasingly incompatible.',
   'favorite': '0',
   'given_title': '',
   'given_url': 'http://nyrevinc.cmail20.com/t/y-l-klpdut-jduhlyklkl-d/',
   'has_image': '0',
   'has_video': '0',
   'is_article': '1',
   'is_index': '0',
   'item_id': '1992211110',
   'resolved_id': '1977788178',
   'resolved_title': 'The Man from Red Vienna',
   'resolved_url': 'http://www.nybooks.com/articles/2017/12/21/karl-polanyi-man-from-red-vienna/',
   'sort_id': 6,
   'status': '0',
   'time_added': '1520132694',
   'time_favorited': '0',
   'time_read': '0',
   'time_updated': '1520140351',
   'word_count': '4009'},

我已经设法将整个结果放到数据框中,但是有一些看起来像一个名为 authors 的字典的嵌套?我已经设法将其拆分为带有索引的字典,但无法弄清楚如何将其转换为数据帧。以下示例作者

{1: {'authors': {'8683682': {'author_id': '8683682',
    'item_id': '1992211110',
    'name': 'Robert Kuttner',
    'url': 'http://www.nybooks.com/contributors/robert-kuttner/'}}},
 2: {'authors': {'53525958': {'author_id': '53525958',
    'item_id': '2086463428',
    'name': 'Adam Tooze',
    'url': 'http://www.nybooks.com/contributors/adam-tooze/'}}},
 3: {'authors': {'3490600': {'author_id': '3490600',
    'item_id': '2090266893',
    'name': 'Adam Liaw',
    'url': ''}}},
 4: {'authors': {'75929933': {'author_id': '75929933',
    'item_id': '2091894678',
    'name': 'umair haque',
    'url': 'https://eand.co/@umairh'}}},
 5: {'authors': {'61177521': {'author_id': '61177521',
    'item_id': '2092663780',
    'name': 'Annalisa Merelli',
    'url': 'https://qz.com/author/amerelliqz/'}}},
 6: {'authors': {'52268529': {'author_id': '52268529',
    'item_id': '2092922221',
    'name': 'Aditya Chakrabortty',
    'url': 'https://www.theguardian.com/profile/adityachakrabortty'}}},
 7: {'authors': {'28083': {'author_id': '28083',
    'item_id': '2096294305',
    'name': 'Alana Semuels',
    'url': ''}}},
 8: {'authors': {'185472': {'author_id': '185472',
    'item_id': '2097100251',
    'name': 'TIM KREIDER',
    'url': ''}}},
 9: {'authors': {'2771923': {'author_id': '2771923',
    'item_id': '2098788948',
    'name': 'Richard Bernstein',
    'url': 'http://www.nybooks.com/contributors/richard-bernstein/'}}},
 10: {'authors': {'61111044': {'author_id': '61111044',
    'item_id': '2102383890',
    'name': 'Ephrat Livni',
    'url': 'https://qz.com/author/livniqz/'}}}}

任何帮助非常感谢,我对python和pandas都很陌生。

1 个答案:

答案 0 :(得分:0)

这是一个提案。您需要过滤辅助字典,以便将其摄取到数据框中。

input是你的第二本字典。

authors_filtered = [v for v in zip(*[dict(item).values() for item in [input[i]['authors'] for i in input]])][0]
output = pd.DataFrame.from_dict(list(authors_filtered))