我仍然对Pandas中多指数的运作感到困惑。我创建了一个多索引如下:
class C1{
public:
//void *getData() {return data;} //Legacy implementation*
char *getData() {return data;} //My new implementation
private:
char data[100];
};
int main()
{
C1 myobj;
unsigned char* begin;
begin=static_cast<unsigned char*>(myobj.getData()); *//<== This gives compile error.use reinterpret_cast ?*
return 0;
}
然后我从中创建了一个空的DataFrame并添加了一个列名&#39; pair&#39;:
import pandas as pd
import numpy as np
arrays = [np.array(['pearson', 'pearson', 'pearson', 'pearson', 'spearman', 'spearman',
'spearman', 'spearman', 'kendall', 'kendall', 'kendall', 'kendall']),
np.array(['PROFESSIONAL', 'PROFESSIONAL', 'STUDENT', 'STUDENT',
'PROFESSIONAL', 'PROFESSIONAL', 'STUDENT', 'STUDENT',
'PROFESSIONAL', 'PROFESSIONAL', 'STUDENT', 'STUDENT']),
np.array(['r', 'p', 'r', 'p', 'rho', 'p', 'rho', 'p', 'tau', 'p', 'tau', 'p'])]
tuples = list(zip(*arrays))
index = pd.MultiIndex.from_tuples(tuples, names=['correlator', 'expertise', 'coeff-p'])
填写了一些玩具数据(results_df = pd.DataFrame(index=index)
results_df.columns.names = ['pair']
),它看起来像这样:
results_df['attr1-attr2'] = [1,2,3,4,5,6,7,8,9,10,11,12]
然而,我想要从字典中添加值,而不是虚拟。对于每个attr-attr对,字典的条目看起来像这样:
pair attr1-attr2
correlator expertise coeff-p
pearson PROFESSIONAL r 1
p 2
STUDENT r 3
p 4
spearman PROFESSIONAL rho 5
p 6
STUDENT rho 7
p 8
kendall PROFESSIONAL tau 9
p 10
STUDENT tau 11
p 12
以下实际示例数据供您使用:
'attr-attr': {
'pearson': {
'STUDENT': {
'r': VALUE,
'p': VALUE
},
'PROFESSIONAL': {
'r': VALUE,
'p': VALUE
}
},
'spearman': {
'STUDENT': {
'r': VALUE,
'p': VALUE
},
'PROFESSIONAL': {
'r': VALUE,
'p': VALUE
}
}
'kendall': {
'STUDENT': {
'r': VALUE,
'p': VALUE
},
'PROFESSIONAL': {
'r': VALUE,
'p': VALUE
}
}
}
因此对于每个人来说都是如此。 (最顶层的键)作为列名,我想将其值添加到多索引中的相应行。但是,我似乎无法以有效的方式找到一种方法。遗漏的值应为correlations = {'NormNedit-NormEC_TOT': {'pearson': {'PROFESSIONAL': {'r': 0.13615071018351657, 'p': 0.0002409555504769095}}, 'spearman': {'STUDENT': {'rho': 0.10867061294616957, 'p': 0.003437711066527592}, 'PROFESSIONAL': {'tau': 0.08185775947238913, 'p': 0.003435247172206748}}, 'kendall': {'STUDENT': {'tau': 0.08185775947238913, 'p': 0.003435247172206748}}}, 'NormLiteral-NormEC_TOT': {'pearson': {'PROFESSIONAL': {'r': 0.13615071018351657, 'p': 0.0002409555504769095}, 'STUDENT': {'tau': 0.08185775947238913, 'p': 0.003435247172206748}}, 'spearman': {'STUDENT': {'rho': 0.10867061294616957, 'p': 0.003437711066527592}, 'PROFESSIONAL': {'r': 0.13615071018351657, 'p': 0.0002409555504769095}}, 'kendall': {'STUDENT': {'tau': 0.08185775947238913, 'p': 0.003435247172206748}}}, 'NormHTra-NormEC_TOT': {'pearson': {'STUDENT': {'r': 0.13615071018351657, 'p': 0.0002409555504769095}}, 'spearman': {'STUDENT': {'rho': 0.10867061294616957, 'p': 0.003437711066527592}, 'PROFESSIONAL': {'r': 0.13615071018351657, 'p': 0.0002409555504769095}}, 'kendall': {'STUDENT': {'tau': 0.08185775947238913, 'p': 0.003435247172206748}}}, 'NormScatter-NormEC_TOT': {'pearson': {'STUDENT': {'r': 0.13615071018351657, 'p': 0.0002409555504769095}}, 'spearman': {'STUDENT': {'rho': 0.10867061294616957, 'p': 0.003437711066527592}, 'PROFESSIONAL': {'tau': 0.08185775947238913, 'p': 0.003435247172206748}}, 'kendall': {'PROFESSIONAL': {'tau': 0.08185775947238913, 'p': 0.003435247172206748}}}, 'NormCrossS-NormEC_TOT': {'pearson': {'STUDENT': {'r': 0.13615071018351657, 'p': 0.0002409555504769095}, 'PROFESSIONAL': {'rho': 0.10867061294616957, 'p': 0.003437711066527592}}, 'spearman': {'STUDENT': {'rho': 0.10867061294616957, 'p': 0.003437711066527592}, 'PROFESSIONAL': {'rho': 0.10867061294616957, 'p': 0.003437711066527592}}, 'kendall': {'PROFESSIONAL': {'tau': 0.08185775947238913, 'p': 0.003435247172206748}}}, 'NormPdur-NormEC_TOT': {'pearson': {'STUDENT': {'r': 0.13615071018351657, 'p': 0.0002409555504769095}, 'PROFESSIONAL': {'rho': 0.10867061294616957, 'p': 0.003437711066527592}}, 'spearman': {'STUDENT': {'rho': 0.10867061294616957, 'p': 0.003437711066527592}}, 'kendall': {'PROFESSIONAL': {'tau': 0.08185775947238913, 'p': 0.003435247172206748}}}}
。我尝试循环字典并使用np.nan
,但这没有用。
query()[]
我知道数据相对复杂,所以如果不清楚,请告诉我。
答案 0 :(得分:1)
您可以调整Wouter Overmeire's answer to this question以从嵌套字典中创建多索引数据框:
d = correlations
df = pd.DataFrame.from_dict({(i,j,k): d[i][j][k]
for i in d.keys()
for j in d[i].keys()
for k in d[i][j].keys()
}, orient='index').stack()
然后,如果您希望列来自嵌套字典的最高级别(attr-attr
级别),则可以将结果取消堆叠:
df = df.unstack(level=0)
注意:我认为样本数据存在错误, 'PROFESSIONAL': {'STUDENT': ...
。如果这不是一个错误,我只是误解了一些事情,请告诉我。