如果我有以下类型的数据 - 字典列表,我如何从中提取一些关键值?
comps = [
{
"name":'Test1',
"p_value":0.02,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test2',
"p_value":0.05,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test3',
"p_value":0.03,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test4',
"p_value":0.07,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test5',
"p_value":0.03,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test6',
"p_value":0.02,
"group0_null": 0.0,
"group1_null": 0.0,
},{
"name":'Test7',
"p_value":0.01,
"group0_null": 0.0,
"group1_null": 0.0,
}]
结果
根据上面的数据,假设我只想要 name
和 p_value
。我怎样才能得到这个结果。
[{
"name":'Test1',
"p_value":0.02,
},{
"name":'Test2',
"p_value":0.05,
},{
"name":'Test3',
"p_value":0.03,
},{
"name":'Test4',
"p_value":0.07,
},{
"name":'Test5',
"p_value":0.03,
},{
"name":'Test6',
"p_value":0.02,
},{
"name":'Test7',
"p_value":0.01,
}]
这说明一切
[c for c in comps]
这仅显示名称 [c['name'] for c in comps]
但是如果我这样做:
[c['name','p_value'] for c in comps ]
我收到错误:
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-94-b29459f7b089> in <module>
----> 1 [c['name','p_value'] for c in comps['continuous_explainers'] ]
2
3 # cont_comps = []
4
5 # for c in comps['continuous_explainers']:
<ipython-input-94-b29459f7b089> in <listcomp>(.0)
----> 1 [c['name','p_value'] for c in comps['continuous_explainers'] ]
2
3 # cont_comps = []
4
5 # for c in comps['continuous_explainers']:
KeyError: ('name', 'p_value')
真正的数据字典比这个大很多。我想这样做,以便我可以列出需要的东西。
更新
由于有人指出我展示的数据结构与我从服务器收到的不同,这里是我用来提取数据的代码。
# get all comparisons
comps = source.get_comparison(name='Pr1 vs. Rest')
# only take the continuous explainers
comps['continuous_explainers'][1:5]
数据
[{'name': 'Gender',
'column_index': 2,
'ks_score': 0.0022329709328575142,
'p_value': 1.0,
'quartiles': [[0.0, 0.0, 1.0, 1.0, 2.0], [0.0, 0.0, 1.0, 1.0, 2.0]],
't_test_p_value': 0.8341377317414621,
'diff_means': 0.0014959875249118681,
'primary_group_mean': 0.6312769010043023,
'secondary_group_mean': 0.6297809134793905,
'ks_sign': '+',
'group0_percent_null': 0.0,
'group1_percent_null': 0.0},
{'name': 'Gender_Missing_color',
'column_index': 3,
'ks_score': 2.220446049250313e-16,
'p_value': 1.0,
'quartiles': [[1.0, 1.0, 1.0, 1.0, 1.0], [1.0, 1.0, 1.0, 1.0, 1.0]],
't_test_p_value': 1.0,
'diff_means': 0.0,
'primary_group_mean': 1.0,
'secondary_group_mean': 1.0,
'ks_sign': '0',
'group0_percent_null': 0.9966523194643712,
'group1_percent_null': 0.9959153360564427},
{'name': 'Gender_Missing',
'column_index': 4,
'ks_score': 0.0007369834078797544,
'p_value': 1.0,
'quartiles': [[0.0, 0.0, 0.0, 0.0, 1.0], [0.0, 0.0, 0.0, 0.0, 1.0]],
't_test_p_value': 0.40301091478187256,
'diff_means': -0.0007369834079284866,
'primary_group_mean': 0.0033476805356288893,
'secondary_group_mean': 0.004084663943557376,
'ks_sign': '-',
'group0_percent_null': 0.0,
'group1_percent_null': 0.0},
{'name': 'Male',
'column_index': 5,
'ks_score': 0.0029699543407862294,
'p_value': 0.9999999999915384,
'quartiles': [[0.0, 0.0, 1.0, 1.0, 1.0], [0.0, 0.0, 1.0, 1.0, 1.0]],
't_test_p_value': 0.6740956861786738,
'diff_means': 0.0029699543407684104,
'primary_group_mean': 0.6245815399330444,
'secondary_group_mean': 0.621611585592276,
'ks_sign': '+',
'group0_percent_null': 0.0,
'group1_percent_null': 0.0}]
这是我得到的输出。如上所述,我只需要这个字典列表中的一些数据。
答案 0 :(得分:1)
您可以为 comparisons
中的每个对象创建一个新的 dict,并仅使用 name
和 p_value
键对其进行初始化。
ex = [{'name': d['name'], 'p_value': d['p_value']} for d in comparisons]
答案 1 :(得分:1)
我仍然不确定如何使上述答案对我有用。但是,我想出了另一种方法来做到这一点:
test = [(c['name'],c['p_value'], c['group0_percent_null']) for c in comps]
pd.DataFrame(test)
0 1 2
0 ID 5.374590e-13 0.000000
1 Gender 1.000000e+00 0.000000
2 Gender_Missing_color 1.000000e+00 0.996652
3 Gender_Missing 1.000000e+00 0.000000
4 Male 1.000000e+00 0.000000
... ... ... ...
它给了我想要的结果。
答案 2 :(得分:-1)
试试
[{'name':c['name'], 'p_value':c['p_value']} for c in comps]