python(pandas):重组groupby语句

时间:2016-07-07 05:40:38

标签: python pandas

我有一些数据代表许多不同网站的结果。我想找到我的结果的四分位数细分以及每个网站的最大和最小日期。

找到每一项都很容易:

#quartiles
q = df.groupby(['site_id', 'datum']).quantile([0.25,0.5,0.75])
#max and min vlaues
d_max = df.groupby(['site_id', 'datum']).max()
d_min = df.groupby(['site_id', 'datum']).min()

结果是多索引数据帧。如何将这些组合在一起以获取site_id和datum的每个组合的所有3个值?

一些示例数据:

from io import StringIO
import pandas as pd

TESTDATA=StringIO(u'''date  site_id datum   result
1968-01-10  RN004481    SWL     61.23
1977-06-07  RN004481    SWL     60.16
1979-12-12  RN004481    SWL     58.76
1971-04-24  RN004482    SWL     79.93
1971-09-29  RN004482    SWL     79.97
1995-09-19  RN004482    SWL     92.91
1996-02-08  RN004482    SWL     93.15
1964-10-29  RN00448411  SWL     67.87
1965-03-04  RN004687    SWL     74.90
1993-03-16  RN02528611  SWL     7.50
2011-10-24  RN029429    SWL     2.59
2011-11-05  RN029429    SWL     2.68
1992-06-24  RN004464    SWL     52.24
1986-08-11  RN004482    SWL     86.84
1998-01-29  RN004482    SWL     94.33
1966-11-24  RN004687    DTW     75.16
1978-08-30  RN004687    SWL     78.24
1983-02-22  RN004687    DTW     81.00
1984-07-24  RN004687    SWL     81.26
1993-07-07  RN004687    SWL     87.18
1994-04-08  RN004687    DTW     87.53
1994-08-11  RN004687    SWL     87.41
2001-01-10  RN004687    SWL     92.04
2010-11-15  RN004687    SWL     97.06
1964-10-01  RN004693    SWL     59.56
1965-06-03  RN004693    SWL     59.74
1967-05-19  RN004693    SWL     59.58
1967-06-23  RN004693    RSWL    59.61
1967-09-22  RN004693    RSWL    59.69
1970-12-16  RN004693    DTW     59.54
''')

df = pd.read_csv(TESTDATA, delim_whitespace=True)

1 个答案:

答案 0 :(得分:2)

这是一种方法:

pd.concat([d_max, d_min, q.unstack().result], axis=1, keys=['max', 'min', 'quantiles'])

enter image description here