对不起,因为这是我的第一篇文章,并且我对Python编码完全陌生。 我想使用NeuroMorpho API(http://neuromorpho.org/apiReference.html)查找和获取有关某些神经元的信息(在查询行中添加了过滤器)。
我使用了以下代码:
import requests
import json
import csv
import pandas as pd
from pandas import DataFrame
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
response = requests.get("http://neuromorpho.org/api")
response
query = (
"http://neuromorpho.org/api/neuron/select?q=species:rat&fq=brain_region:hippocampus, CA1&fq=experiment_condition:Control&fq=cell_type:Pyramidal, principal cell"
)
response = requests.get(query)
json_data = response.json()
rat_data = json_data
rat_data
我得到了大量数据,并一直显示如下:
'page':{'size':50,'totalElements':1115,'totalPages':23, 'number':0}}
然后我想根据该数据创建字典,并使用以下代码:
df_dict = {}
df_dict['NeuronID'] = []
df_dict['Archive'] = []
df_dict['Strain'] = []
df_dict['Cell'] = []
df_dict['Region'] = []
for i in rat_data['_embedded']['neuronResources']:
df_dict['NeuronID'].append(str(i['neuron_id']))
df_dict['Archive'].append(str(i['archive']))
df_dict['Strain'].append(str(i['strain']))
df_dict['Cell'].append(str(i['cell_type']))
df_dict['Region'].append(str(i['brain_region']))
rat_df = DataFrame(df_dict)
print(rat_df)
最后,当我检查字典的长度时:
len(rat_df)
输出为50。
因此,我最终发现程序仅从第一个(第0页)中拉出了前50个神经元。根据开始时的输出,我还剩下23页。 如何将所有这些结果放入一个字典或一个类中,即,有什么方法可以列出所有这些页面?我尝试了几种循环选项,但没有成功。
很抱歉,如果这是一个简单的问题,或者我犯了一些错误,但是过去几天我一直在尝试所有操作,但没有得到任何结果。
答案 0 :(得分:0)
免责声明:我不是HTTP或 Requests 库的专家,并且以前没有使用 neuromorpho.org ,所以请把它和一粒盐一起吃。
您可以通过第一个请求查询页面数,然后循环浏览各个页面。在循环中,您必须将请求的页面作为HTTP GET-Method的参数,例如?page=42&...
,像这样:
url = 'http://neuromorpho.org/api/neuron/select'
params = {
'page' : 0,
'q' : 'species:rat',
'fq' : [
'brain_region:hippocampus,CA1',
'experiment_condition:Control',
'cell_type:Pyramidal,principal cell' ] }
totalPages = requests.get(url, params).json()['page']['totalPages']
df_dict = {
'NeuronID' : list(),
'Archive' : list(),
'Strain' : list(),
'Cell' : list(),
'Region' : list() }
for pageNum in range(totalPages):
params['page'] = pageNum
response = requests.get(url, params)
print('Querying page {} -> status code: {}'.format(
pageNum, response.status_code))
if (response.status_code == 200): #only parse successful requests
data = response.json()
for row in data['_embedded']['neuronResources']:
df_dict['NeuronID'].append(str(row['neuron_id']))
df_dict['Archive'].append(str(row['archive']))
df_dict['Strain'].append(str(row['strain']))
df_dict['Cell'].append(str(row['cell_type']))
df_dict['Region'].append(str(row['brain_region']))
rat_df = pd.DataFrame(df_dict)
print(rat_df)
您可以在控制台输出中看到生成的DataFrame
以及请求的页码如何变化:
Querying page 0 -> status code: 200
Querying page 1 -> status code: 200
Querying page 2 -> status code: 200
Querying page 3 -> status code: 200
Querying page 4 -> status code: 200
Querying page 5 -> status code: 200
Querying page 6 -> status code: 200
Querying page 7 -> status code: 200
Querying page 8 -> status code: 200
Querying page 9 -> status code: 200
Querying page 10 -> status code: 200
Querying page 11 -> status code: 200
Querying page 12 -> status code: 200
Querying page 13 -> status code: 200
Querying page 14 -> status code: 200
Querying page 15 -> status code: 200
Querying page 16 -> status code: 200
Querying page 17 -> status code: 200
Querying page 18 -> status code: 200
Querying page 19 -> status code: 200
Querying page 20 -> status code: 200
Querying page 21 -> status code: 200
Querying page 22 -> status code: 200
NeuronID Archive Strain Cell Region
0 100 Turner Fischer 344 ['pyramidal', 'principal cell'] ['hippocampus', 'CA1']
1 101 Turner Fischer 344 ['pyramidal', 'principal cell'] ['hippocampus', 'CA1']
2 1016 Ascoli Sprague-Dawley ['pyramidal', 'principal cell'] ['hippocampus']
3 1019 Ascoli Sprague-Dawley ['pyramidal', 'principal cell'] ['hippocampus']
4 102 Turner Fischer 344 ['pyramidal', 'principal cell'] ['hippocampus', 'CA1']
... ... ... ... ... ...
1110 99614 Guizzetti Sprague-Dawley ['principal cell', 'pyramidal'] ['hippocampus', 'CA1', 'left']
1111 99615 Guizzetti Sprague-Dawley ['principal cell', 'pyramidal'] ['hippocampus', 'CA1', 'left']
1112 99616 Guizzetti Sprague-Dawley ['principal cell', 'pyramidal'] ['hippocampus', 'CA1', 'left']
1113 99617 Guizzetti Sprague-Dawley ['principal cell', 'pyramidal'] ['hippocampus', 'CA1', 'left']
1114 99618 Guizzetti Sprague-Dawley ['principal cell', 'pyramidal'] ['hippocampus', 'CA1', 'left']
[1115 rows x 5 columns]
我通过添加修改后的代码版本来解析循环中的响应,从而更改了发布的代码。我认为 neuromorpho.org API中存在一个小错误,因为它在最后一页(数字22)中以size: 50
响应,而它仅包含15(索引0-14) JSON响应中的对象。您可以通过遍历JSON对象并忽略报告的大小来避免该问题。
意识到GET参数不必在URL中进行编码,但是 Requests 在我们将它们作为dict
传递给我们时已经做到了(更新代码)。
我希望这会有所帮助!