有条件地复制行/使用有条件地返回行的函数进行转换

时间:2017-08-03 15:42:22

标签: python-2.7 pandas apply pandas-groupby

我想在索引为'derby_one'的所有行中扩展列'one'来创建新列'Derby',如下面的valid_result所示

number              one  two  three
country town
AU      Newcastle     0    1      2
        Derby         3    4      5
        Sydney        6    7      8
UK      Derby         9   10     11
        Kensington   12   13     14
        Newcastle    15   16     17
USA     Derby        18   19     20

transform函数

可以实现类似的结果
data.groupby(['country']).one.transform(max)

但是,我不确定如何修改它以与索引器进行交互

示例数据

import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(21).reshape(7,3), index=pd.MultiIndex(levels=[[u'AU', u'UK', 'USA'], [u'Derby', u'Kensington', u'Newcastle', u'Sydney']], labels=[[0, 0, 0, 1, 1, 1, 2], [2, 0, 3, 0, 1, 2, 0]], names=[u'country', u'town']), columns=pd.Index(['one', 'two', 'three'], name='number'))

# create test data set
test = data.copy()
derby_one = pd.Series(np.array([0,0,0,9,9,9,18]), index=data.index)
test['derby_one'] = derby_one

注意事项

我不想使用连接/合并功能,因为我的真实数据集非常大,例如以下选项不可行

derby_one = data.loc[pd.IndexSlice[:, 'Derby'], ['one']].reset_index()
derby_one = derby_one[['country', 'one']].rename(columns={'one':'derby_one'})
pd.merge(
    data.reset_index(),
    derby_one,
    left_on=['country'],
    right_on=['country']).set_index(['country', 'town']
)

1 个答案:

答案 0 :(得分:1)

您可以这样做,让Pandas为您调整索引,并使用query过滤掉那些' Derby'然后使用groupbytransform来填充组中的NaN值:

data['derby_one'] = data.query('town == "Derby"')['one']
data.groupby(['country'])['derby_one'].transform(max)

输出

number              one  two  three  derby_one
country town                                  
AU      Derby         0    1      2        0.0
        Newcastle     3    4      5        0.0
        Sydney        6    7      8        0.0
UK      Derby         9   10     11        9.0
        Kensington   12   13     14        9.0
        Newcastle    15   16     17        9.0
USA     Derby        18   19     20       18.0