使用带有Pandas的MultiIndex(多行)切片更新DataFrame

时间:2016-04-28 14:43:32

标签: python pandas multi-index

我有/PROJECT/..MultiIndex,希望一个I, M同时更新所有i \in I行。

这是我的数据框:

M

以下是我想填写的内容:

>>> result.head(n=10)
Out[9]: 
         FINLWT21
i INCAGG         
0 1           NaN
  7           NaN
  9           NaN
  5           NaN
  3           NaN
1 1           NaN
  7           NaN
  9           NaN
  5           NaN
  3           NaN

我认为正确的命令是sample.groupby(field).sum() FINLWT21 INCAGG 1 8800809.719 3 9951002.611 5 9747721.721 7 7683066.990 9 11091861.692 。但是,以下是result.loc[i] = sample.groupby(field).sum()的内容:

result

如何更新所有"内部索引"在同一时间?

3 个答案:

答案 0 :(得分:1)

您想使用pd.IndexSlice。它返回一个可以用loc进行剪裁的对象。

解决方案

result.sort_index();
slc = pd.IndexSlice[i, :]
result.loc[slc, :] = sample.groupby(field).sum()

解释

result.sort_index(); - > pd.IndexSclice要求对索引进行排序。

slc = pd.IndexSclice[i, :] - >用于创建通用切片器的语法,以获得具有2个级别的pd.MultiIndex的第1级的第i个组。

' result.loc [slc,:] =` - >使用切片

示范

import pandas as pd
import numpy as np


result = pd.DataFrame([], columns=['FINLWT21'],
                      index=pd.MultiIndex.from_product([[0, 1], [1, 7, 9, 5, 3]]))

result.sort_index(inplace=True);
slc = pd.IndexSlice[0, :]

result.loc[slc, :] = [1, 2, 3, 4, 5]

print result

    FINLWT21
0 1        1
  3        2
  5        3
  7        4
  9        5
1 1      NaN
  3      NaN
  5      NaN
  7      NaN
  9      NaN

答案 1 :(得分:0)

这是我可能正在寻找的功能:

def _assign_multi_index(dest, k, v, inplace=True, bool_nan=False):
    """
    assigns v to dest[k] inplace, doing a "sensible" multi-index alignment, raising
    a ValueError if no alignment is achieved.
    I'm not sure if there's a better way to do this, or a reason not to do it
    the way it's currently written.
    """
    if not inplace:
        raise NotImplementedError()
    if k in dest:
        warn("key '{}' already exists, continue with caution!".format(k))
    v_names = v.index.names
    dest_names = dest.index.names
    if all(n in dest_names for n in v_names):
        if len(v_names) < len(dest_names): 
            # if need be, drop some index levels temporarily in dest
            dropped_names = [n for n in dest_names if n not in v_names]
            dest.reset_index(dropped_names, inplace=True)
        v.index = v.index.reorder_levels([n for n in dest_names if n in v_names]) # just to be safe
    else:
        raise ValueError("index levels do not match dest.")

    dest[k] = v

    # restore the original index levels if need be
    if dest.index.names != dest_names:
        dest.reset_index(inplace=True)
        dest.set_index(dest_names, inplace=True)

    if bool_nan != np.nan and v.dtype.name == 'bool' and dest[k].dtype.name != 'bool':
        # this happens when nans had to be inserted, let's convert nans
        dest_k = dest[k].copy()
        dest_k[pd.isnull(dest_k)] = bool_nan 
        dest[k] = dest_k.astype(bool)

答案 2 :(得分:0)

事实证明,最好的方法是在正确的数据集中添加索引。以下按预期工作:

    data = sample.groupby(field).sum()
    data['index'] = i

    result.loc[i] = data.reset_index().set_index(['index', field])