我有一个多索引的pandas数据框,我使用了groupby
方法,然后使用describe
方法给出了以下内容:
grouped= self.HK_data.groupby(level=[0,1])
summary= grouped.describe()
给出:
Antibody Time
Customer_Col1A2 0 count 3.000000
mean 0.757589
std 0.188750
min 0.639933
25% 0.648732
50% 0.657532
75% 0.816417
max 0.975302
10 count 3.000000
mean 0.716279
std 0.061939
min 0.665601
25% 0.681757
50% 0.697913
75% 0.741618
max 0.785324
... .........
我使用
计算了SEM
SEM=grouped.mean()/(numpy.sqrt(grouped.count()))
,并提供:
Antibody Time
Customer_Col1A2 0 0.437394
10 0.413544
120 0.553361
180 0.502792
20 0.512797
240 0.514609
30 0.505618
300 0.481021
45 0.534658
5 0.425800
60 0.430633
90 0.525115
... .........
我如何concat
这两个框架使SEM成为摘要统计的另一个条目?
类似于:
Antibody Time
Customer_Col1A2 0 count 3.000000
mean 0.757589
std 0.188750
min 0.639933
25% 0.648732
50% 0.657532
75% 0.816417
max 0.975302
SEM 0.437394
10 count 3.000000
mean 0.716279
std 0.061939
min 0.665601
25% 0.681757
50% 0.697913
75% 0.741618
max 0.785324
SEM 0.413544
我已经尝试pandas.concat
,但这并没有给我我想要的东西。
谢谢!
答案 0 :(得分:2)
我认为您首先添加第三级MultiIndex
,按MultiIndex.from_tuples
分配新索引,最后一次使用concat
sort_index
:
HK_data = pd.DataFrame({'Antibody':['Customer_Col1A2','Customer_Col1A2','Customer_Col1A2'],
'Time':[0,10,10],
'Col':[7,8,9]})
HK_data = HK_data.set_index(['Antibody','Time'])
print (HK_data)
Col
Antibody Time
Customer_Col1A2 0 7
10 8
10 9
grouped= HK_data.groupby(level=[0,1])
summary= grouped.describe()
print (summary)
Col
Antibody Time
Customer_Col1A2 0 count 1.000000
mean 7.000000
std NaN
min 7.000000
25% 7.000000
50% 7.000000
75% 7.000000
max 7.000000
10 count 2.000000
mean 8.500000
std 0.707107
min 8.000000
25% 8.250000
50% 8.500000
75% 8.750000
max 9.000000
SEM=grouped.mean()/(np.sqrt(grouped.count()))
#change multiindex
new_index = list(zip(SEM.index.get_level_values('Antibody'),
SEM.index.get_level_values('Time'),
['SEM'] * len(SEM.index)))
SEM.index = pd.MultiIndex.from_tuples(new_index, names=('Antibody','Time', None))
print (SEM)
Col
Antibody Time
Customer_Col1A2 0 SEM 7.000000
10 SEM 6.010408
df = pd.concat([summary, SEM]).sort_index()
print (df)
Col
Antibody Time
Customer_Col1A2 0 25% 7.000000
50% 7.000000
75% 7.000000
SEM 7.000000
count 1.000000
max 7.000000
mean 7.000000
min 7.000000
std NaN
10 25% 8.250000
50% 8.500000
75% 8.750000
SEM 6.010408
count 2.000000
max 9.000000
mean 8.500000
min 8.000000
std 0.707107