NeuroMorpho.org-从多个API页面获取结果

时间:2020-06-18 06:55:01

标签: python json api python-requests python-requests-html

对不起,因为这是我的第一篇文章,并且我对Python编码完全陌生。 我想使用NeuroMorpho API(http://neuromorpho.org/apiReference.html)查找和获取有关某些神经元的信息(在查询行中添加了过滤器)。

我使用了以下代码:

import requests
import json
import csv
import pandas as pd
from pandas import DataFrame
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

response = requests.get("http://neuromorpho.org/api")
response

query = (
    "http://neuromorpho.org/api/neuron/select?q=species:rat&fq=brain_region:hippocampus, CA1&fq=experiment_condition:Control&fq=cell_type:Pyramidal, principal cell"
)

response = requests.get(query)
json_data = response.json()
rat_data = json_data
rat_data

我得到了大量数据,并一直显示如下:

'page':{'size':50,'totalElements':1115,'totalPages':23, 'number':0}}

然后我想根据该数据创建字典,并使用以下代码:

df_dict = {}
df_dict['NeuronID'] = []
df_dict['Archive'] = []
df_dict['Strain'] = []
df_dict['Cell'] = []
df_dict['Region'] = []
for i in rat_data['_embedded']['neuronResources']:
    df_dict['NeuronID'].append(str(i['neuron_id']))
    df_dict['Archive'].append(str(i['archive']))
    df_dict['Strain'].append(str(i['strain']))
    df_dict['Cell'].append(str(i['cell_type']))
    df_dict['Region'].append(str(i['brain_region']))

rat_df = DataFrame(df_dict)
print(rat_df)

最后,当我检查字典的长度时:

len(rat_df)

输出为50。

因此,我最终发现程序仅从第一个(第0页)中拉出了前50个神经元。根据开始时的输出,我还剩下23页。 如何将所有这些结果放入一个字典或一个类中,即,有什么方法可以列出所有这些页面?我尝试了几种循环选项,但没有成功。

很抱歉,如果这是一个简单的问题,或者我犯了一些错误,但是过去几天我一直在尝试所有操作,但没有得到任何结果。

1 个答案:

答案 0 :(得分:0)

免责声明:我不是HTTP或 Requests 库的专家,并且以前没有使用 neuromorpho.org ,所以请把它和一粒盐一起吃。

您可以通过第一个请求查询页面数,然后循环浏览各个页面。在循环中,您必须将请求的页面作为HTTP GET-Method的参数,例如?page=42&...,像这样:

url = 'http://neuromorpho.org/api/neuron/select'
params = {
        'page' : 0,
        'q' : 'species:rat',
        'fq' : [
            'brain_region:hippocampus,CA1',
            'experiment_condition:Control',
            'cell_type:Pyramidal,principal cell' ] }

totalPages = requests.get(url, params).json()['page']['totalPages']

df_dict = {
        'NeuronID' : list(),
        'Archive' : list(),
        'Strain' :  list(),
        'Cell' : list(),
        'Region' : list() }

for pageNum in range(totalPages):
    params['page'] = pageNum
    response = requests.get(url, params)
    print('Querying page {} -> status code: {}'.format(
        pageNum, response.status_code))
    if (response.status_code == 200):    #only parse successful requests
        data = response.json()
        for row in data['_embedded']['neuronResources']:
            df_dict['NeuronID'].append(str(row['neuron_id']))
            df_dict['Archive'].append(str(row['archive']))
            df_dict['Strain'].append(str(row['strain']))
            df_dict['Cell'].append(str(row['cell_type']))
            df_dict['Region'].append(str(row['brain_region']))

rat_df = pd.DataFrame(df_dict)
print(rat_df)

您可以在控制台输出中看到生成的DataFrame以及请求的页码如何变化:

Querying page 0 -> status code: 200
Querying page 1 -> status code: 200
Querying page 2 -> status code: 200
Querying page 3 -> status code: 200
Querying page 4 -> status code: 200
Querying page 5 -> status code: 200
Querying page 6 -> status code: 200
Querying page 7 -> status code: 200
Querying page 8 -> status code: 200
Querying page 9 -> status code: 200
Querying page 10 -> status code: 200
Querying page 11 -> status code: 200
Querying page 12 -> status code: 200
Querying page 13 -> status code: 200
Querying page 14 -> status code: 200
Querying page 15 -> status code: 200
Querying page 16 -> status code: 200
Querying page 17 -> status code: 200
Querying page 18 -> status code: 200
Querying page 19 -> status code: 200
Querying page 20 -> status code: 200
Querying page 21 -> status code: 200
Querying page 22 -> status code: 200
     NeuronID    Archive          Strain                             Cell                          Region
0         100     Turner     Fischer 344  ['pyramidal', 'principal cell']          ['hippocampus', 'CA1']
1         101     Turner     Fischer 344  ['pyramidal', 'principal cell']          ['hippocampus', 'CA1']
2        1016     Ascoli  Sprague-Dawley  ['pyramidal', 'principal cell']                 ['hippocampus']
3        1019     Ascoli  Sprague-Dawley  ['pyramidal', 'principal cell']                 ['hippocampus']
4         102     Turner     Fischer 344  ['pyramidal', 'principal cell']          ['hippocampus', 'CA1']
...       ...        ...             ...                              ...                             ...
1110    99614  Guizzetti  Sprague-Dawley  ['principal cell', 'pyramidal']  ['hippocampus', 'CA1', 'left']
1111    99615  Guizzetti  Sprague-Dawley  ['principal cell', 'pyramidal']  ['hippocampus', 'CA1', 'left']
1112    99616  Guizzetti  Sprague-Dawley  ['principal cell', 'pyramidal']  ['hippocampus', 'CA1', 'left']
1113    99617  Guizzetti  Sprague-Dawley  ['principal cell', 'pyramidal']  ['hippocampus', 'CA1', 'left']
1114    99618  Guizzetti  Sprague-Dawley  ['principal cell', 'pyramidal']  ['hippocampus', 'CA1', 'left']

[1115 rows x 5 columns]

更新#1:

我通过添加修改后的代码版本来解析循环中的响应,从而更改了发布的代码。我认为 neuromorpho.org API中存在一个小错误,因为它在最后一页(数字22)中以size: 50响应,而它仅包含15(索引0-14) JSON响应中的对象。您可以通过遍历JSON对象并忽略报告的大小来避免该问题。

更新#2:

意识到GET参数不必在URL中进行编码,但是 Requests 在我们将它们作为dict传递给我们时已经做到了(更新代码)。

我希望这会有所帮助!