如何在熊猫数据框中存储由API请求产生的字典列表?

时间:2019-11-30 19:08:03

标签: python json api dataframe dictionary

使用文档中提供的代码示例调用API后,我收到了输出。

输出是100多个字典的列表,如下所示:

[{'author': {'avatar_url': None, 'id': 1345069, 'name': 'Alaa Elassar'},
  'body': 'text',
  'categories': [{'confident': True,
                  'id': 'IAB2',
                  'level': 1,
                  'links': {'_self': 'https://api.aylien.com/api/v1/classify/taxonomy/iab-qag/IAB2',
                            'parent': None},
                  'score': 0.1,
                  'taxonomy': 'iab-qag'},
                 {'confident': False,}]

我需要将数据保存在pandas df中,并且由于列数非常大,所以我无法手动指定每个名称。对于上面的示例,我想要的列是“作者”,“正文”,“类别”。

到目前为止,我已经尝试过:

from pandas.io.json import json_normalize

tesla = json_normalize(tesla.stories)
print (df)

输出为:

"story" object has no attribute "values"

我设法将其转换为单列pandas df,每一行都有一个字典,但是我尝试将其拆分为几列,但出现错误。

这是我用来调用API的代码。

from __future__ import print_function
import aylien_news_api
from aylien_news_api.rest import ApiException
from pprint import pprint
configuration = aylien_news_api.Configuration()

# Configure API key authorization: app_id
configuration.api_key['X-AYLIEN-NewsAPI-Application-ID'] = 'KEY'

# Configure API key authorization: app_key
configuration.api_key['X-AYLIEN-NewsAPI-Application-Key'] = 'KEY'
configuration.host = "https://api.aylien.com/news"

# Create an instance of the API class
api_instance = aylien_news_api.DefaultApi(aylien_news_api.ApiClient(configuration))

def fetch_new_stories(params={}):
  fetched_stories = []
  stories = None

  while stories is None or len(stories) > 0:
    try:
      response = api_instance.list_stories(**params)
    except ApiException as e:
      if ( e.status == 429 ):
        print('Usage limit are exceeded. Wating for 60 seconds...')
        time.sleep(60)
        continue

    stories = response.stories
    params['cursor'] = response.next_page_cursor

    fetched_stories += stories
    print("Fetched %d stories. Total story count so far: %d" %
      (len(stories), len(fetched_stories)))

  return fetched_stories

params = {
  'title': 'cybertruck',
  'published_at_start': '2019-11-20T12:00:00Z',
  'published_at_end': 'NOW',
  'cursor': '*',
  'per_page': 100
}

stories = fetch_new_stories(params)

print('************')
print("Fetched %d stories which are are about Cybertruck and were published between %s and %s" %
(len(stories), params['published_at_start'], params['published_at_end']))

0 个答案:

没有答案