Question

我正在为《纽约时报》 Api使用yanytapi（https://pypi.org/project/yanytapi/）Python包装器。我设法运行搜索并以运行以下代码的JSON格式获取数据：

obama = api.search("Obama", 
                          fq={"headline": "Obama", 
                              "source": ["Reuters", 
                                         "AP", 
                                         "The New York Times"]}, 
                          begin_date="20190701", # this can also be an int
                          facet_field=["source", "day_of_week"], 
                          facet_filter=True)
for item in obama:
    print(item)

输出看起来像这样：

{“ _id”：“ nyt：// article / 2c48c662-6053-562e-8187-88c954f5983f”，“博客”： {}，“行”：{“原始”：“由Arit John”，“人”：[{“名字”： “ Arit”，“ middlename”：null，“ lastname”：“ John”，“ qualifier”：null， “ title”：空，“ role”：“已报告”，“ organization”：“”，“ rank”：1}]]， “ organization”：null}，“ document_type”：“ article”，“ headline”： {“主要”：“奥巴马分享他的夏季阅读清单”，“踢球”：空， “ content_kicker”：null，“ print_headline”：“巴拉克•奥巴马（Barack Obama）分享他的阅读列表”，“名称”：null，“ seo”：null，“ sub”：null}，“ keywords”： [{“ name”：“ subject”，“ value”：“ Writing and Writers”，“ rank”：1， “ major”：“ N”}，{“ name”：“ subject”，“ value”：“ Books and Literature”， “ rank”：2，“ major”：“ N”}，{“ name”：“ persons”，“ value”：“ Obama，巴拉克”，“等级”：3，“主要”：“ N”}] ....

我试图提取数据并将其放入运行以下命令的df中：

users_locs = [[article['_id'], article["document_type"]] for article in obama]
df = pd.DataFrame(data=users_locs, columns=['ID', 'type'])
df

但是我的数据框为空？为什么？如何提取？

Answer 1

根据文档，文章是Doc对象，要访问不同的字段，您应使用.<field_name>语法，例如：

obama = api.search("Obama", 
                          fq={"headline": "Obama", 
                              "source": ["Reuters", 
                                         "AP", 
                                         "The New York Times"]}, 
                          begin_date="20190821", # this can also be an int
                          facet_field=["source", "day_of_week"], 
                          facet_filter=True)

users_locs = [[article._id, article.document_type] for article in obama]
df = pd.DataFrame(data=users_locs, columns=['ID', 'type'])
df

这是我的结果：

    ID  type 
 0  nyt://article/5722feb7-c751-50dd-ac84-85526e11...   article
 1  nyt://article/3577d507-ba57-5b9c-bcee-b1542650...   article
 2  nyt://article/9c2f0502-8264-5645-af44-d8656d5d...   article
 3  nyt://article/b55ca58d-dc0f-5f5f-a01c-178d2fc7...   article
 4  nyt://article/f3596774-562f-5c74-b62f-2c60f2d2...   article
 5  nyt://article/d783f1e3-26b3-561d-9455-5f2e035b...   article
 6  nyt://article/aa503b22-66ab-5796-a923-e3c99c79...   article
 7  nyt://article/41e68733-a47e-58bc-bbc8-f93397f2...   article
 8  nyt://article/98bc5831-3639-5abc-a339-3e1d74fc...   article
 9  nyt://article/ff30c8ef-bf58-5ce8-9d92-4b25a464...   article

从Doc的迭代器中提取数据

1 个答案: