使用文档中提供的代码示例调用API后,我收到了输出。
输出是100多个字典的列表,如下所示:
[{'author': {'avatar_url': None, 'id': 1345069, 'name': 'Alaa Elassar'},
'body': 'text',
'categories': [{'confident': True,
'id': 'IAB2',
'level': 1,
'links': {'_self': 'https://api.aylien.com/api/v1/classify/taxonomy/iab-qag/IAB2',
'parent': None},
'score': 0.1,
'taxonomy': 'iab-qag'},
{'confident': False,}]
我需要将数据保存在pandas df中,并且由于列数非常大,所以我无法手动指定每个名称。对于上面的示例,我想要的列是“作者”,“正文”,“类别”。
到目前为止,我已经尝试过:
from pandas.io.json import json_normalize
tesla = json_normalize(tesla.stories)
print (df)
输出为:
"story" object has no attribute "values"
我设法将其转换为单列pandas df,每一行都有一个字典,但是我尝试将其拆分为几列,但出现错误。
这是我用来调用API的代码。
from __future__ import print_function
import aylien_news_api
from aylien_news_api.rest import ApiException
from pprint import pprint
configuration = aylien_news_api.Configuration()
# Configure API key authorization: app_id
configuration.api_key['X-AYLIEN-NewsAPI-Application-ID'] = 'KEY'
# Configure API key authorization: app_key
configuration.api_key['X-AYLIEN-NewsAPI-Application-Key'] = 'KEY'
configuration.host = "https://api.aylien.com/news"
# Create an instance of the API class
api_instance = aylien_news_api.DefaultApi(aylien_news_api.ApiClient(configuration))
def fetch_new_stories(params={}):
fetched_stories = []
stories = None
while stories is None or len(stories) > 0:
try:
response = api_instance.list_stories(**params)
except ApiException as e:
if ( e.status == 429 ):
print('Usage limit are exceeded. Wating for 60 seconds...')
time.sleep(60)
continue
stories = response.stories
params['cursor'] = response.next_page_cursor
fetched_stories += stories
print("Fetched %d stories. Total story count so far: %d" %
(len(stories), len(fetched_stories)))
return fetched_stories
params = {
'title': 'cybertruck',
'published_at_start': '2019-11-20T12:00:00Z',
'published_at_end': 'NOW',
'cursor': '*',
'per_page': 100
}
stories = fetch_new_stories(params)
print('************')
print("Fetched %d stories which are are about Cybertruck and were published between %s and %s" %
(len(stories), params['published_at_start'], params['published_at_end']))