我试图在pandas中使用(不是真的)新的切片操作符,但有些东西我还没有得到。假设我生成以下分层数据帧:
#Generate container to hold component DFs
df_list=[]
#Generate names for third dimension positions
third_names=['front','middle','back']
#For three positions in the third dimension...
for lab in third_names:
#...generate the corresponding section of raw data...
d=DataFrame(np.random.uniform(size=20).reshape(4,5),columns='a b c d e'.split(' '))
#...name the columns dimension...
d.columns.name='dim1'
#...generate second and third dims (to go in index)...
d['dim2']=['one','two','three','four']
d['dim3']=lab
#...set index...
d.set_index(['dim3','dim2'],inplace=True)
#...and throw the DF in the container
df_list.append(d)
#Concatenate component DFs together
d3=pd.concat(df_list)
d3_long=d3.stack().sortlevel(0)
print d3_long
收率:
dim3 dim2 dim1
back four a 0.501184
b 0.627202
c 0.329643
d 0.484261
e 0.884803
one a 0.834231
b 0.918897
c 0.196537
d 0.242109
e 0.860124
three a 0.782651
b 0.998361
c 0.849685
d 0.210377
e 0.866776
two a 0.908422
b 0.737073
c 0.064402
d 0.240718
e 0.044409
front four a 0.100877
b 0.963870
c 0.254075
d 0.126556
e 0.033631
one a 0.243552
b 0.999168
c 0.752251
d 0.684718
e 0.353013
three a 0.938928
b 0.112993
c 0.615178
d 0.430318
e 0.330437
two a 0.301921
b 0.645425
c 0.464172
d 0.824765
e 0.606823
middle four a 0.814888
b 0.228860
c 0.333184
d 0.622176
e 0.151248
one a 0.547780
b 0.592404
c 0.684111
d 0.885605
e 0.601560
three a 0.340951
b 0.839149
c 0.800098
d 0.663753
e 0.215224
two a 0.138430
b 0.917627
c 0.342968
d 0.406744
e 0.822957
dtype: float64
我可以通过我期望的行为获得前两个维度......
print d3_long.loc[(slice('front','middle'),slice('two','four')),:]
收率:
dim3 dim2 dim1
front four a 0.100877
b 0.963870
c 0.254075
d 0.126556
e 0.033631
one a 0.243552
b 0.999168
c 0.752251
d 0.684718
e 0.353013
three a 0.938928
b 0.112993
c 0.615178
d 0.430318
e 0.330437
two a 0.301921
b 0.645425
c 0.464172
d 0.824765
e 0.606823
middle four a 0.814888
b 0.228860
c 0.333184
d 0.622176
e 0.151248
one a 0.547780
b 0.592404
c 0.684111
d 0.885605
e 0.601560
three a 0.340951
b 0.839149
c 0.800098
d 0.663753
e 0.215224
two a 0.138430
b 0.917627
c 0.342968
d 0.406744
e 0.822957
dtype: float64
但是,以下调用会产生完全相同的结果。
d3_long.loc[(slice('front','middle'),slice('two','four'),slice('b','d')),:]
它忽略了MultiIndex的第三级。当我尝试使用列表构造来获取特定位置时......
d3_long.loc[(slice('front','middle'),slice('two','four'),['b','d']),:]
它产生TypeError
。有什么想法吗?
答案 0 :(得分:0)
d3_long
实际上是Series
,因此您不需要切片器中的最后一个:
。请注意,您的第二级slice('two','four')
未选择任何内容(它等同于[-1:1]
)。
但如果你颠倒了订单,它应该给你所期望的。
In [82]: d3_long.loc[slice('front','middle'),slice('four','two'), ['b','d']]
Out[82]:
dim3 dim2 dim1
front four b 0.301573
d 0.478005
one b 0.306292
d 0.281984
three b 0.108174
d 0.776523
two b 0.028694
d 0.527417
middle four b 0.285103
d 0.647165
one b 0.807411
d 0.309446
three b 0.277752
d 0.939555
two b 0.470019
d 0.447640
dtype: float64