如何从多索引pandas数据帧中选择连续的行?

时间:2016-08-01 06:01:56

标签: python pandas dataframe

我有一个多索引pandas数据帧(groupbydataframe),如下所示:

shipLivesLabel.text = "\(shipLives)"

最后一栏是“销售”。

  1. 如果SELECT feature, count(distinct id) AS Total, count(distinct if (result='pass', id, null)) AS passed FROM X.results WHERE ver = '4.2' GROUP BY 1 ORDER BY 1; Store Year Month 1 2013 1 4922.0 2 5315.5 3 5603.0 4 4625.0 5 4995.0 6 4407.0 7 4421.0 8 4494.0 9 4212.0 10 4148.5 11 5033.5 12 6111.0 2014 1 4547.5 2 4726.5 3 4455.0 4 4577.5 5 4909.0 6 4788.0 7 4507.0 8 4124.0 9 3970.5 10 4431.0 11 5220.0 12 6466.0 2015 1 4690.0 2 4467.5 3 4294.5 4 4256.0 5 4529.0 6 4102.0 ... 1112 2013 2 10991.0 3 12322.0 4 10705.0 5 12096.0 6 11029.0 7 10419.0 8 8994.0 9 9448.0 10 9019.0 11 9294.0 12 11163.0 2014 1 8156.5 2 8862.5 3 8606.5 4 9662.0 5 10375.0 6 8755.0 7 8262.0 8 8454.0 9 7621.5 10 8779.0 11 9638.0 12 9592.0 2015 1 9345.0 2 8466.5 3 8457.0 4 8715.0 5 9125.0 6 8528.0 7 7728.0 如何为FindYear=2014提取FindMonth=012013-12的销售额?
  2. 如果2014-02Store=1是包含FindYear的列表。如何(1)不单独遍历所有商店?
  3. 或者我是以错误的方式看待这个?我需要FindMonth并从那里开始吗?

1 个答案:

答案 0 :(得分:1)

您可以先转换Multiindex to_period的第二和第三级:

s = pd.Series({(1, 2013, 2): 5315.5, (1, 2014, 8): 4124.0, (1, 2014, 9): 3970.5, (1112, 2013, 8): 8994.0, (1, 2013, 12): 6111.0, (1, 2015, 3): 4294.5, (1112, 2014, 10): 8779.0, (1, 2014, 10): 4431.0, (1112, 2013, 9): 9448.0, (1112, 2014, 12): 9592.0, (1, 2015, 2): 4467.5, (1, 2014, 11): 5220.0, (1112, 2013, 10): 9019.0, (1, 2015, 1): 4690.0, (1112, 2013, 11): 9294.0, (1112, 2015, 2): 8466.5, (1, 2013, 9): 4212.0, (1112, 2015, 1): 9345.0, (1112, 2013, 4): 10705.0, (1, 2013, 8): 4494.0, (1112, 2014, 9): 7621.5, (1112, 2013, 5): 12096.0, (1, 2014, 4): 4577.5, (1, 2013, 11): 5033.5, (1, 2015, 6): 4102.0, (1112, 2013, 6): 11029.0, (1, 2014, 5): 4909.0, (1, 2013, 10): 4148.5, (1, 2015, 5): 4529.0, (1112, 2014, 8): 8454.0, (1112, 2013, 7): 10419.0, (1, 2014, 6): 4788.0, (1, 2015, 4): 4256.0, (1, 2014, 7): 4507.0, (1112, 2014, 5): 10375.0, (1112, 2015, 3): 8457.0, (1112, 2014, 4): 9662.0, (1, 2013, 5): 4995.0, (1112, 2013, 2): 10991.0, (1, 2014, 1): 4547.5, (1112, 2014, 7): 8262.0, (1, 2013, 4): 4625.0, (1112, 2013, 3): 12322.0, (1, 2014, 2): 4726.5, (1112, 2014, 6): 8755.0, (1, 2013, 7): 4421.0, (1, 2014, 3): 4455.0, (1112, 2014, 1): 8156.5, (1, 2013, 6): 4407.0, (1112, 2014, 11): 9638.0, (1, 2014, 12): 6466.0, (1, 2013, 1): 4922.0, (1112, 2015, 7): 7728.0, (1112, 2013, 12): 11163.0, (1112, 2014, 3): 8606.5, (1112, 2015, 4): 8715.0, (1112, 2014, 2): 8862.5, (1, 2013, 3): 5603.0, (1112, 2015, 6): 8528.0, (1112, 2015, 5): 9125.0})
per = pd.DatetimeIndex(s.index.map(lambda x: pd.to_datetime(str(x[1]) 
                                                          + str(x[2]), format='%Y%m')))
        .to_period('m')

print (per)
PeriodIndex(['2013-01', '2013-02', '2013-03', '2013-04', '2013-05', '2013-06',
             '2013-07', '2013-08', '2013-09', '2013-10', '2013-11', '2013-12',
             '2014-01', '2014-02', '2014-03', '2014-04', '2014-05', '2014-06',
             '2014-07', '2014-08', '2014-09', '2014-10', '2014-11', '2014-12',
             '2015-01', '2015-02', '2015-03', '2015-04', '2015-05', '2015-06',
             '2013-02', '2013-03', '2013-04', '2013-05', '2013-06', '2013-07',
             '2013-08', '2013-09', '2013-10', '2013-11', '2013-12', '2014-01',
             '2014-02', '2014-03', '2014-04', '2014-05', '2014-06', '2014-07',
             '2014-08', '2014-09', '2014-10', '2014-11', '2014-12', '2015-01',
             '2015-02', '2015-03', '2015-04', '2015-05', '2015-06', '2015-07'],
            dtype='int64', freq='M')

然后分配新的Multiindex,其中第二级为PeriodIndex

new_index = list(zip(df.index.get_level_values('Store'), per))
s.index = pd.MultiIndex.from_tuples(new_index, names = ('Store','Period'))
print (s)
Store  Period 
1      2013-01     4922.0
       2013-02     5315.5
       2013-03     5603.0
       2013-04     4625.0
       2013-05     4995.0
       2013-06     4407.0
       2013-07     4421.0
       2013-08     4494.0
       2013-09     4212.0
       2013-10     4148.5
       2013-11     5033.5
       2013-12     6111.0
       2014-01     4547.5
       2014-02     4726.5
       2014-03     4455.0
...
...
...

选择loc

print (s.loc[1,pd.Period('2014-01')])
4547.5

print (s.loc[1,pd.Period('2013-12'):pd.Period('2014-02')])
Store  Period 
1      2013-12    6111.0
       2014-01    4547.5
       2014-02    4726.5
Name: Sales, dtype: float64

per1 = pd.Period('2014-01')
per2 = pd.Period('2015-01')
per = [per1, per2]

print (s.loc[1, per1 - 1: per1 + 1])
Store  Period 
1      2013-12    6111.0
       2014-01    4547.5
       2014-02    4726.5
Name: Sales, dtype: float64

按列表理解进行多次选择:

print ([s.loc[1, x - 1: x + 1] for x in per])
[Store  Period 
1      2013-12    6111.0
       2014-01    4547.5
       2014-02    4726.5
Name: Sales, dtype: float64, Store  Period 
1      2014-12    6466.0
       2015-01    4690.0
       2015-02    4467.5
Name: Sales, dtype: float64]

可以结论:

print (pd.concat([s.loc[1, x - 1: x + 1] for x in per]))
Store  Period 
1      2013-12    6111.0
       2014-01    4547.5
       2014-02    4726.5
       2014-12    6466.0
       2015-01    4690.0
       2015-02    4467.5
Name: Sales, dtype: float64