使用python为pandas dataframes中的pandas数据帧编制索引

时间:2015-03-30 00:06:08

标签: python python-2.7 indexing pandas

我在数据框中有一系列数据帧。

顶级数据框的结构如下:

    24hr   48hr   72hr
D1  x      x      x
D2  x      x      x 
D3  x      x      x

在每种情况下,x都是使用pandas.read_excel()

创建的数据框

每个x数据框中的一列具有标题“平均容器长度”,该列中有三个条目(即行,索引)。

我想要返回的是“平均血管长度”列的平均值。我也对如何返回该列中的特定单元格感兴趣。我知道pandas数据帧有一个.mean方法,但我无法弄清楚使用它的索引语法。

以下是一个例子

import pandas as pd

a = {'Image name' : ['Image 1', 'Image 2', 'Image 3'], 'threshold' : [20, 25, 30], 'Average Vessels Length' : [14.2, 22.6, 15.7] }
b = pd.DataFrame(a, columns=['Image name', 'threshold', 'Average Vessels Length'])

c = pd.DataFrame(index=['D1','D2','D3'], columns=['24hr','48hr','72hr'])
c['24hr']['D1'] = a
c['48hr']['D1'] = a
c['72hr']['D1'] = a
c['24hr']['D2'] = a
c['48hr']['D2'] = a
c['72hr']['D2'] = a
c['24hr']['D3'] = a
c['48hr']['D3'] = a
c['72hr']['D3'] = a

这将返回“平均血管长度”列中值的平均值:

print b['Average Vessels Length'].mean()

这将返回24小时,D1,“平均血管长度”

中的所有值
print c['24hr']['D1']['Average Vessels Length']

这不起作用:

print c['24hr']['D1']['Average Vessels Length'].mean()

我无法弄清楚如何在c ['24hr'] ['D1'] ['平均血管长度']中访问任何特定值

最终,我想从Dx ['平均血管长度']的每一列取平均数。均值()并除以相应的D1 ['平均血管长度']。意味着()

非常感谢任何帮助。

1 个答案:

答案 0 :(得分:0)

我假设你说你的大数据帧的每个元素都是一个数据帧,你的示例数据应该是:

import pandas as pd

a = {'Image name' : ['Image 1', 'Image 2', 'Image 3'], 'threshold' : [20, 25, 30], 'Average Vessels Length' : [14.2, 22.6, 15.7] }
b = pd.DataFrame(a, columns=['Image name', 'threshold', 'Average Vessels Length'])

c = pd.DataFrame(index=['D1','D2','D3'], columns=['24hr','48hr','72hr'])
c['24hr']['D1'] = b
c['48hr']['D1'] = b
c['72hr']['D1'] = b
c['24hr']['D2'] = b
c['48hr']['D2'] = b
c['72hr']['D2'] = b
c['24hr']['D3'] = b
c['48hr']['D3'] = b
c['72hr']['D3'] = b

要获得每个单元格的平均值,您可以使用applymap 将函数映射到DataFrame的每个单元格:

cell_means = c.applymap(lambda e: e['Average Vessels Length'].mean())
cell_means
Out[14]: 
    24hr  48hr  72hr
D1  17.5  17.5  17.5
D2  17.5  17.5  17.5
D3  17.5  17.5  17.5

一旦你有了那些你可以得到列意味着等,并继续通过平均值正常化:

col_means = cell_means.mean(axis=0)
col_means
Out[11]: 
24hr    17.5
48hr    17.5
72hr    17.5
dtype: float64