熊猫MultiIndex切片和索引

时间:2018-07-21 16:59:09

标签: pandas multi-index

我刚刚开始使用多帧图像,但是对于相当稀疏的文档以及切片和索引的在线示例有些麻烦。

考虑以下多重框架

import pandas as pd
import numpy as np
levels={
'produce_source':['Vendor A', 'Vendor B'],
'day':['mon','wed','fri'],
'chiller_temp':['low','mid'],
'fruit':['apples','pears','nanas']
}

index = pd.MultiIndex.from_product(levels.values(), names = list(levels.keys()))
df = pd.DataFrame(index=index)
df = df.assign(deliveries=np.random.rand(len(df)))


                                        deliveries
produce_source day chiller_temp fruit             
Vendor A       mon low          apples    0.748376
                                pears     0.639824
                                nanas     0.604342
                   mid          apples    0.160837
                                pears     0.970412
                                nanas     0.301815
               wed low          apples    0.572627
                                pears     0.254242
                                nanas     0.590702
                   mid          apples    0.153772
                                pears     0.180117
                                nanas     0.858085
               fri low          apples    0.535358
                                pears     0.576359
                                nanas     0.893993
                   mid          apples    0.334602
                                pears     0.053892
                                nanas     0.778767
Vendor B       mon low          apples    0.565761
                                pears     0.437994
                                nanas     0.090994
                   mid          apples    0.261041
                                pears     0.028795
                                nanas     0.057612
               wed low          apples    0.808108
                                pears     0.914724
                                nanas     0.020663
                   mid          apples    0.055319
                                pears     0.888612
                                nanas     0.623370
               fri low          apples    0.419422
                                pears     0.938593
                                nanas     0.358441
                   mid          apples    0.534191
                                pears     0.590103
                                nanas     0.753034

实现以下目标的最pythonic方法是什么

1)以切片的形式查看所有的结婚数据

1a)扩展目标:不在乎'day'是index.names [1],而是按索引名'day'进行索引

2)仅将可迭代数据写入该楔形片

3)为所有供应商以及日日累月的水果增加chiller_temp

我看到使用idx = pd.IndexSlice进行切片。

idx = pd.IndexSlice
df_wip = df.loc[idx[:,'wed'], ] #1)  
#would love to write to df_wip sliced df here but get slice copy warning with df_wip['deliveries'] = list(range(0,100*len(df_wip),100)) 
df = df.loc[idx[:,'wed'],'deliveries'] = list(range(0,100*len(df_wip),100)) #2)

这将引发错误AttributeError:'list'对象没有属性'loc'

df = df.loc[idx[:,'wed'],'deliveries'] = pd.Series(range(0,100*len(df_wip),100)) #2)

引发TypeError:无法散列的类型:'sl​​ice'

1 个答案:

答案 0 :(得分:1)

  

1)以切片的形式查看所有的结婚数据

要查看多索引中的数据,使用.xs(横截面)要容易得多,它使您可以为特定索引级别指定值,而不用键入.loc这样的所有级别,而w / slice将让你做:

df.xs('wed', level='day')

Out:
                                        deliveries
produce_source  chiller_temp    fruit   
Vendor A        low             apples  0.521861
                                pears   0.741856
                                nanas   0.245843
                mid             apples  0.471135
                                pears   0.191322
                                nanas   0.153920
Vendor B        low             apples  0.711457
                                pears   0.211794
                                nanas   0.599071
                mid             apples  0.303910
                                pears   0.657348
                                nanas   0.111750
  

2)仅将可迭代数据写入该楔形片

如果我正确理解这一点,则您尝试将“交货”列中的值替换为一天中“星期三”的特定可迭代项(例如列表)。不幸的是,.loc类型替换在这种情况下不起作用。据我所知,pandas仅具有简单的语法即可使用.at或.loc替换单个单元格的值(请参阅此SO answer)。但是,我们可以使用迭代来完成此任务:

idx = pd.IndexSlice

# If we don't change the column's type, which was float, this will error
df['deliveries'] = df['deliveries'].astype(object)

# Loop through rows, replacing single values
# Only necessary if the new assigned value is mutable
for index, row in df.loc[idx[:,'wed'], 'deliveries':'deliveries'].iterrows():
    df.at[index, 'deliveries'] = ["We", "changed", "this"]

df.head(10)

Out:
                                            deliveries
produce_source  day  chiller_temp   fruit   
Vendor A        mon  low            apples  0.0287606
                                    pears   0.264512
                                    nanas   0.238089
                     mid            apples  0.814985
                                    pears   0.590967
                                    nanas   0.919351
                wed  low            apples  [We, changed, this]
                                    pears   [We, changed, this]
                                    nanas   [We, changed, this]
                     mid            apples  [We, changed, this]

据我所知,虽然循环是必需的,但在我的选择中,使用df.xs然后使用df.update而不是.loc更容易理解。例如,以下代码与上面的.loc代码相同:

df['deliveries'] = df['deliveries'].astype(object)

# Create a temporary copy of our cross section
df2 = df.xs('wed', level='day', drop_level=False)

# The same loop as before
for index, row in df2.iterrows():
    df2.at[index, 'deliveries'] = ["We", "changed", "this"]

# Update the original df for the values we want from df2
df.update(df2, join="left", overwrite=True, filter_func=None, raise_conflict=False)
  

3)为所有供应商以及日日累月的水果增加chiller_temp

替换多索引现有级别中的值需要替换整个级别。这可以通过df.index.set_levels(更简便的IMO)或pd.MultiIndex.from_arrays完成。根据确切的用例,可能需要使用映射和/或替换。请查看this SO answer以获取其他示例。

df.index = df.index.set_levels(['high' for v in df.index.get_level_values('chiller_temp')], level='chiller_temp')
  

4)我看到使用idx = pd.IndexSlice发生了一些切片...这   引发错误AttributeError:“列表”对象没有属性   'loc'...引发TypeError:无法散列的类型:'sl​​ice'

对于AttributeError: 'list' object has no attribute 'loc'TypeError: unhashable type: 'slice'错误,您在这些行中只有两个分配。

看起来您的.loc语法是正确的,除了不能以这种方式分配pd.Series而不使单元格值为NaN(请参见2的答案)以获取正确的语法)。这有效:

idx = pd.IndexSlice
df.loc[idx[:,'wed'], 'deliveries':'deliveries'] = "We changed this"