我想在索引为'derby_one'
的所有行中扩展列'one'
来创建新列'Derby'
,如下面的valid_result
所示
number one two three
country town
AU Newcastle 0 1 2
Derby 3 4 5
Sydney 6 7 8
UK Derby 9 10 11
Kensington 12 13 14
Newcastle 15 16 17
USA Derby 18 19 20
transform
函数
data.groupby(['country']).one.transform(max)
但是,我不确定如何修改它以与索引器进行交互
import pandas as pd
import numpy as np
data = pd.DataFrame(np.arange(21).reshape(7,3), index=pd.MultiIndex(levels=[[u'AU', u'UK', 'USA'], [u'Derby', u'Kensington', u'Newcastle', u'Sydney']], labels=[[0, 0, 0, 1, 1, 1, 2], [2, 0, 3, 0, 1, 2, 0]], names=[u'country', u'town']), columns=pd.Index(['one', 'two', 'three'], name='number'))
# create test data set
test = data.copy()
derby_one = pd.Series(np.array([0,0,0,9,9,9,18]), index=data.index)
test['derby_one'] = derby_one
我不想使用连接/合并功能,因为我的真实数据集非常大,例如以下选项不可行
derby_one = data.loc[pd.IndexSlice[:, 'Derby'], ['one']].reset_index()
derby_one = derby_one[['country', 'one']].rename(columns={'one':'derby_one'})
pd.merge(
data.reset_index(),
derby_one,
left_on=['country'],
right_on=['country']).set_index(['country', 'town']
)
答案 0 :(得分:1)
您可以这样做,让Pandas为您调整索引,并使用query
过滤掉那些' Derby'然后使用groupby
和transform
来填充组中的NaN值:
data['derby_one'] = data.query('town == "Derby"')['one']
data.groupby(['country'])['derby_one'].transform(max)
输出
number one two three derby_one
country town
AU Derby 0 1 2 0.0
Newcastle 3 4 5 0.0
Sydney 6 7 8 0.0
UK Derby 9 10 11 9.0
Kensington 12 13 14 9.0
Newcastle 15 16 17 9.0
USA Derby 18 19 20 18.0