Question

我试图抓取文章＆＃39;来自Scopus api的数据。我有api密钥，可以从标准视图接收字段。

以下是示例：

首先，初始化（api，搜索查询和标题）

import json
import requests

api_resource = "https://api.elsevier.com/content/search/scopus?"
search_param = 'query=title-abs-key(big data)'  # for example

# headers
headers = dict()
headers['X-ELS-APIKey'] = api_key
headers['X-ELS-ResourceVersion'] = 'XOCS'
headers['Accept'] = 'application/json'

现在我可以收到文章json（例如，第一页的第一篇文章）：

# request with first searching page
page_request = requests.get(api_resource + search_param, headers=headers)
# response to json
page = json.loads(page_request.content.decode("utf-8"))
# List of articles from this page
articles_list = page['search-results']['entry']

article = articles_list[0]

我可以从标准视图中轻松获得一些主要字段：

title = article['dc:title']
cit_count = article['citedby-count']
authors = article['dc:creator']
date = article['prism:coverDate']

但是，我需要本文的关键字和引用。我解决了关键字的问题以及对文章的其他请求：

article_url = article['prism:url']
# something like this:
# 'http://api.elsevier.com/content/abstract/scopus_id/84909993848'

with field = authkeywords

article_request = requests.get(article_url + "?field=authkeywords", headers=headers)
article_keywords = json.loads(article_request.content.decode("utf-8"))
keywords = [keyword['$'] for keyword in article_keywords['abstracts-retrieval-response']['authkeywords']['author-keyword']]

此方法有效，但有时会丢失关键字。此外，scopus api-key具有请求限制（每周10000个），这种方式不是最佳的。

我可以更容易吗？

关于引文的下一个问题。要找到文章的引文，我可以通过使用文章[＆＃39; eid＆＃39;]字段再次发送一个请求：

citations_response = requests.get(api_resource + 'query=refeid(' + str(article['eid']) + ')', headers=headers)
citations_result = json.loads(citations_response.content.decode("utf-8"))
citations = citations_result['search-results']['entry']  # list of citations

那么，我可以在没有其他要求的情况下获得引用吗？

Answer 1

只能使用COMPLETE视图通过单个查询获取引用。（仅限订户）

scopus关键字和引用爬行

1 个答案: