我刚刚开始使用多帧图像,但是对于相当稀疏的文档以及切片和索引的在线示例有些麻烦。
考虑以下多重框架
import pandas as pd
import numpy as np
levels={
'produce_source':['Vendor A', 'Vendor B'],
'day':['mon','wed','fri'],
'chiller_temp':['low','mid'],
'fruit':['apples','pears','nanas']
}
index = pd.MultiIndex.from_product(levels.values(), names = list(levels.keys()))
df = pd.DataFrame(index=index)
df = df.assign(deliveries=np.random.rand(len(df)))
deliveries
produce_source day chiller_temp fruit
Vendor A mon low apples 0.748376
pears 0.639824
nanas 0.604342
mid apples 0.160837
pears 0.970412
nanas 0.301815
wed low apples 0.572627
pears 0.254242
nanas 0.590702
mid apples 0.153772
pears 0.180117
nanas 0.858085
fri low apples 0.535358
pears 0.576359
nanas 0.893993
mid apples 0.334602
pears 0.053892
nanas 0.778767
Vendor B mon low apples 0.565761
pears 0.437994
nanas 0.090994
mid apples 0.261041
pears 0.028795
nanas 0.057612
wed low apples 0.808108
pears 0.914724
nanas 0.020663
mid apples 0.055319
pears 0.888612
nanas 0.623370
fri low apples 0.419422
pears 0.938593
nanas 0.358441
mid apples 0.534191
pears 0.590103
nanas 0.753034
实现以下目标的最pythonic方法是什么
1)以切片的形式查看所有的结婚数据
1a)扩展目标:不在乎'day'是index.names [1],而是按索引名'day'进行索引
2)仅将可迭代数据写入该楔形片
3)为所有供应商以及日日累月的水果增加chiller_temp
我看到使用idx = pd.IndexSlice进行切片。
idx = pd.IndexSlice
df_wip = df.loc[idx[:,'wed'], ] #1)
#would love to write to df_wip sliced df here but get slice copy warning with df_wip['deliveries'] = list(range(0,100*len(df_wip),100))
df = df.loc[idx[:,'wed'],'deliveries'] = list(range(0,100*len(df_wip),100)) #2)
这将引发错误AttributeError:'list'对象没有属性'loc'
df = df.loc[idx[:,'wed'],'deliveries'] = pd.Series(range(0,100*len(df_wip),100)) #2)
引发TypeError:无法散列的类型:'slice'
答案 0 :(得分:1)
1)以切片的形式查看所有的结婚数据
要查看多索引中的数据,使用.xs(横截面)要容易得多,它使您可以为特定索引级别指定值,而不用键入.loc这样的所有级别,而w / slice将让你做:
df.xs('wed', level='day')
Out:
deliveries
produce_source chiller_temp fruit
Vendor A low apples 0.521861
pears 0.741856
nanas 0.245843
mid apples 0.471135
pears 0.191322
nanas 0.153920
Vendor B low apples 0.711457
pears 0.211794
nanas 0.599071
mid apples 0.303910
pears 0.657348
nanas 0.111750
2)仅将可迭代数据写入该楔形片
如果我正确理解这一点,则您尝试将“交货”列中的值替换为一天中“星期三”的特定可迭代项(例如列表)。不幸的是,.loc类型替换在这种情况下不起作用。据我所知,pandas仅具有简单的语法即可使用.at或.loc替换单个单元格的值(请参阅此SO answer)。但是,我们可以使用迭代来完成此任务:
idx = pd.IndexSlice
# If we don't change the column's type, which was float, this will error
df['deliveries'] = df['deliveries'].astype(object)
# Loop through rows, replacing single values
# Only necessary if the new assigned value is mutable
for index, row in df.loc[idx[:,'wed'], 'deliveries':'deliveries'].iterrows():
df.at[index, 'deliveries'] = ["We", "changed", "this"]
df.head(10)
Out:
deliveries
produce_source day chiller_temp fruit
Vendor A mon low apples 0.0287606
pears 0.264512
nanas 0.238089
mid apples 0.814985
pears 0.590967
nanas 0.919351
wed low apples [We, changed, this]
pears [We, changed, this]
nanas [We, changed, this]
mid apples [We, changed, this]
据我所知,虽然循环是必需的,但在我的选择中,使用df.xs然后使用df.update而不是.loc更容易理解。例如,以下代码与上面的.loc代码相同:
df['deliveries'] = df['deliveries'].astype(object)
# Create a temporary copy of our cross section
df2 = df.xs('wed', level='day', drop_level=False)
# The same loop as before
for index, row in df2.iterrows():
df2.at[index, 'deliveries'] = ["We", "changed", "this"]
# Update the original df for the values we want from df2
df.update(df2, join="left", overwrite=True, filter_func=None, raise_conflict=False)
3)为所有供应商以及日日累月的水果增加chiller_temp
替换多索引现有级别中的值需要替换整个级别。这可以通过df.index.set_levels(更简便的IMO)或pd.MultiIndex.from_arrays完成。根据确切的用例,可能需要使用映射和/或替换。请查看this SO answer以获取其他示例。
df.index = df.index.set_levels(['high' for v in df.index.get_level_values('chiller_temp')], level='chiller_temp')
4)我看到使用idx = pd.IndexSlice发生了一些切片...这 引发错误AttributeError:“列表”对象没有属性 'loc'...引发TypeError:无法散列的类型:'slice'
对于AttributeError: 'list' object has no attribute 'loc'
和TypeError: unhashable type: 'slice'
错误,您在这些行中只有两个分配。
看起来您的.loc语法是正确的,除了不能以这种方式分配pd.Series而不使单元格值为NaN(请参见2的答案)以获取正确的语法)。这有效:
idx = pd.IndexSlice
df.loc[idx[:,'wed'], 'deliveries':'deliveries'] = "We changed this"