Pandas DataFrame:计算特定索引元素/位置之间的指标

时间:2017-11-03 15:09:22

标签: python pandas

我有一个长度为N的DataFrame和任意距离的某些索引/位置ni。现在,我想计算两个连续索引元素nini+1之间的指标。

示例:

import numpy as np
import pandas as pd


df = pd.DataFrame(np.random.randn(10, 4), columns=list('ABCD'))
df['id'] = ['W', 'W', 'W', 'Z', 'Z', 'Y', 'Y', 'Y', 'Z', 'Z']

print(df)

          A         B         C         D id
0  0.347501 -1.152416  1.441144 -0.144545  w
1  0.775828 -1.176764  0.203049 -0.305332  w
2  1.036246 -0.467927  0.088138 -0.438207  w
3 -0.737092 -0.231706  0.268403  0.464026  x
4 -1.857346 -1.420284 -0.515517 -0.231774  x
5 -0.970731  0.217890  0.193814 -0.078838  y
6 -0.318314 -0.244348  0.162103  1.204386  y
7  0.340199  1.074977  1.201068 -0.431473  y
8  0.202050  0.790434  0.643458 -0.068620  z
9 -0.882865  0.687325 -0.008771 -0.066912  z

现在让我们说n1=0n2=4n3=5n4=9并想要计算A和B列之间的算术平均值,例如:平均值(n1n2),平均值(n2n3),平均值(n3n4),平均值({{1 }},n4)。

预期输出将是一个包含4行(均值)和两个列(A和B)的DataFrame。

欢迎任何提示!

提前致谢!

3 个答案:

答案 0 :(得分:2)

您是否正在寻找具有列表符合性的pd.concat,即

l = [n1,n2,n3,n4]
newl = list(zip(l,l[1:]))
# [(0, 4), (4, 5), (5, 9)]
pd.concat([df.loc[i[0]:i[1],['A','B']].mean() for i in newl])

输出:

A   -0.044437
B    0.295627
A   -0.884344
B   -0.005827
A    0.451703
B    0.077761
dtype: float64

在您获得预期输出的情况下,我们可以垂直连接并转置数据框

ndf = pd.concat([df.loc[i[0]:i[1],['A','B']].mean() for i in newl],1).T
          A         B
0 -0.044437  0.295627
1 -0.884344 -0.005827
2  0.451703  0.077761

答案 1 :(得分:1)

使用.loc切片:

In [11]: n1=0; n2=4; n3=5; n4=9

In [12]: df.loc[n1:n2, "A"]
Out[12]:
0    0.347501
1    0.775828
2    1.036246
3   -0.737092
4   -1.857346
Name: A, dtype: float64

In [13]: df.loc[n3:n4, "B"]
Out[13]:
5    0.217890
6   -0.244348
7    1.074977
8    0.790434
9    0.687325
Name: B, dtype: float64

In [14]: df.loc[n1:n2, "A"].mean()
Out[14]: -0.086972599999999956

In [15]: df.loc[n3:n4, "B"].mean()
Out[15]: 0.50525560000000003

答案 2 :(得分:0)

使用.iloc

n1=0
n2=4
n3=5
n4=9

df
Out[22]: 
          A         B         C         D id
0 -0.238283  0.109911  0.351710  0.048457  W
1 -0.325829  0.017999 -0.965771 -0.860846  W
2 -1.095183 -0.448895  1.690735  0.140668  W
3 -0.016087  1.025236  1.634730  0.755837  Z
4 -1.394894  0.343395 -0.522272  0.308791  Z
5  0.308004 -2.243848  0.359605 -0.806157  Y
6 -0.149900  0.305214 -2.250844  0.385339  Y
7 -0.562943 -0.651464  1.241993 -0.963086  Y
8 -0.465702  1.429940 -0.146888  0.436931  Z
9 -0.766442  0.899470  0.210917 -0.751582  Z

df.iloc[n1:n2]
Out[23]: 
          A         B         C         D id
0 -0.238283  0.109911  0.351710  0.048457  W
1 -0.325829  0.017999 -0.965771 -0.860846  W
2 -1.095183 -0.448895  1.690735  0.140668  W
3 -0.016087  1.025236  1.634730  0.755837  Z


#The Mean for each Column within your index range  
df.iloc[n1:n2].mean()
Out[24]: 
A   -0.418846
B    0.176063
C    0.677851
D    0.021029
dtype: float64

#The Mean for each Row within your index range
df.iloc[n1:n2].mean(axis=1)
Out[25]: 
0    0.067949
1   -0.533612
2    0.071831
3    0.849929
dtype: float64

#To get the mean for a specific Column
df["A"].iloc[n1:n2].mean()
Out[31]: -0.4188455553382261

我希望以上回答你的问题。