我有一个多索引pandas数据帧(groupbydataframe),如下所示:
shipLivesLabel.text = "\(shipLives)"
最后一栏是“销售”。
SELECT feature, count(distinct id) AS Total,
count(distinct if (result='pass', id, null)) AS passed
FROM X.results
WHERE ver = '4.2'
GROUP BY 1 ORDER BY 1;
和Store Year Month
1 2013 1 4922.0
2 5315.5
3 5603.0
4 4625.0
5 4995.0
6 4407.0
7 4421.0
8 4494.0
9 4212.0
10 4148.5
11 5033.5
12 6111.0
2014 1 4547.5
2 4726.5
3 4455.0
4 4577.5
5 4909.0
6 4788.0
7 4507.0
8 4124.0
9 3970.5
10 4431.0
11 5220.0
12 6466.0
2015 1 4690.0
2 4467.5
3 4294.5
4 4256.0
5 4529.0
6 4102.0
...
1112 2013 2 10991.0
3 12322.0
4 10705.0
5 12096.0
6 11029.0
7 10419.0
8 8994.0
9 9448.0
10 9019.0
11 9294.0
12 11163.0
2014 1 8156.5
2 8862.5
3 8606.5
4 9662.0
5 10375.0
6 8755.0
7 8262.0
8 8454.0
9 7621.5
10 8779.0
11 9638.0
12 9592.0
2015 1 9345.0
2 8466.5
3 8457.0
4 8715.0
5 9125.0
6 8528.0
7 7728.0
如何为FindYear=2014
提取FindMonth=01
到2013-12
的销售额?2014-02
和Store=1
是包含FindYear
的列表。如何(1)不单独遍历所有商店?FindMonth
并从那里开始吗?答案 0 :(得分:1)
您可以先转换Multiindex
to_period
的第二和第三级:
s = pd.Series({(1, 2013, 2): 5315.5, (1, 2014, 8): 4124.0, (1, 2014, 9): 3970.5, (1112, 2013, 8): 8994.0, (1, 2013, 12): 6111.0, (1, 2015, 3): 4294.5, (1112, 2014, 10): 8779.0, (1, 2014, 10): 4431.0, (1112, 2013, 9): 9448.0, (1112, 2014, 12): 9592.0, (1, 2015, 2): 4467.5, (1, 2014, 11): 5220.0, (1112, 2013, 10): 9019.0, (1, 2015, 1): 4690.0, (1112, 2013, 11): 9294.0, (1112, 2015, 2): 8466.5, (1, 2013, 9): 4212.0, (1112, 2015, 1): 9345.0, (1112, 2013, 4): 10705.0, (1, 2013, 8): 4494.0, (1112, 2014, 9): 7621.5, (1112, 2013, 5): 12096.0, (1, 2014, 4): 4577.5, (1, 2013, 11): 5033.5, (1, 2015, 6): 4102.0, (1112, 2013, 6): 11029.0, (1, 2014, 5): 4909.0, (1, 2013, 10): 4148.5, (1, 2015, 5): 4529.0, (1112, 2014, 8): 8454.0, (1112, 2013, 7): 10419.0, (1, 2014, 6): 4788.0, (1, 2015, 4): 4256.0, (1, 2014, 7): 4507.0, (1112, 2014, 5): 10375.0, (1112, 2015, 3): 8457.0, (1112, 2014, 4): 9662.0, (1, 2013, 5): 4995.0, (1112, 2013, 2): 10991.0, (1, 2014, 1): 4547.5, (1112, 2014, 7): 8262.0, (1, 2013, 4): 4625.0, (1112, 2013, 3): 12322.0, (1, 2014, 2): 4726.5, (1112, 2014, 6): 8755.0, (1, 2013, 7): 4421.0, (1, 2014, 3): 4455.0, (1112, 2014, 1): 8156.5, (1, 2013, 6): 4407.0, (1112, 2014, 11): 9638.0, (1, 2014, 12): 6466.0, (1, 2013, 1): 4922.0, (1112, 2015, 7): 7728.0, (1112, 2013, 12): 11163.0, (1112, 2014, 3): 8606.5, (1112, 2015, 4): 8715.0, (1112, 2014, 2): 8862.5, (1, 2013, 3): 5603.0, (1112, 2015, 6): 8528.0, (1112, 2015, 5): 9125.0})
per = pd.DatetimeIndex(s.index.map(lambda x: pd.to_datetime(str(x[1])
+ str(x[2]), format='%Y%m')))
.to_period('m')
print (per)
PeriodIndex(['2013-01', '2013-02', '2013-03', '2013-04', '2013-05', '2013-06',
'2013-07', '2013-08', '2013-09', '2013-10', '2013-11', '2013-12',
'2014-01', '2014-02', '2014-03', '2014-04', '2014-05', '2014-06',
'2014-07', '2014-08', '2014-09', '2014-10', '2014-11', '2014-12',
'2015-01', '2015-02', '2015-03', '2015-04', '2015-05', '2015-06',
'2013-02', '2013-03', '2013-04', '2013-05', '2013-06', '2013-07',
'2013-08', '2013-09', '2013-10', '2013-11', '2013-12', '2014-01',
'2014-02', '2014-03', '2014-04', '2014-05', '2014-06', '2014-07',
'2014-08', '2014-09', '2014-10', '2014-11', '2014-12', '2015-01',
'2015-02', '2015-03', '2015-04', '2015-05', '2015-06', '2015-07'],
dtype='int64', freq='M')
然后分配新的Multiindex
,其中第二级为PeriodIndex
:
new_index = list(zip(df.index.get_level_values('Store'), per))
s.index = pd.MultiIndex.from_tuples(new_index, names = ('Store','Period'))
print (s)
Store Period
1 2013-01 4922.0
2013-02 5315.5
2013-03 5603.0
2013-04 4625.0
2013-05 4995.0
2013-06 4407.0
2013-07 4421.0
2013-08 4494.0
2013-09 4212.0
2013-10 4148.5
2013-11 5033.5
2013-12 6111.0
2014-01 4547.5
2014-02 4726.5
2014-03 4455.0
...
...
...
选择loc
:
print (s.loc[1,pd.Period('2014-01')])
4547.5
print (s.loc[1,pd.Period('2013-12'):pd.Period('2014-02')])
Store Period
1 2013-12 6111.0
2014-01 4547.5
2014-02 4726.5
Name: Sales, dtype: float64
per1 = pd.Period('2014-01')
per2 = pd.Period('2015-01')
per = [per1, per2]
print (s.loc[1, per1 - 1: per1 + 1])
Store Period
1 2013-12 6111.0
2014-01 4547.5
2014-02 4726.5
Name: Sales, dtype: float64
按列表理解进行多次选择:
print ([s.loc[1, x - 1: x + 1] for x in per])
[Store Period
1 2013-12 6111.0
2014-01 4547.5
2014-02 4726.5
Name: Sales, dtype: float64, Store Period
1 2014-12 6466.0
2015-01 4690.0
2015-02 4467.5
Name: Sales, dtype: float64]
可以结论:
print (pd.concat([s.loc[1, x - 1: x + 1] for x in per]))
Store Period
1 2013-12 6111.0
2014-01 4547.5
2014-02 4726.5
2014-12 6466.0
2015-01 4690.0
2015-02 4467.5
Name: Sales, dtype: float64