在熊猫中使用群组进行操作

时间:2016-12-09 11:31:01

标签: python pandas pandas-groupby

我有一个问题让我头晕目眩。 假设我有下一个数据帧:

df2 = pd.DataFrame(np.random.randint(0,3,size=(10, 4)),columns=['ONE', 'TWO', 'CARS', 'FOUR'])
df2['NAMES'] = ['Peter','Jon','Mary','Mary','Peter','Peter','BONIFACE','Michael','Lucy','Gilari']
df2['CARS'] = ['Mercedes','BMW','Ford','BMW','BMW','Dacia','Ford','Pontiac','Chevrolet','Tesla']

例如,我将它分为汽车。

agrupe = df2.groupby(['CARS'])

问题在于,一旦我对它进行分组,我就想用它进行操作,例如在BMW制造的组中,我想从第1列上有2的元素中将col 2的值分配给col 4。让我们看看如果我学会操作它:

g = agrupe.get_group('BMW')

从此开始

     ONE TWO CARS  FOUR  NAMES
1    1    0  BMW     1    Jon
3    2    1  BMW     1   Mary
4    0    1  BMW     0  Peter

到此:

    ONE  TWO CARS  FOUR  NAMES
1    1    0  BMW     1   Jon
3    2    1  BMW     1   Mary
4    0    1  BMW     1  Peter

1 个答案:

答案 0 :(得分:1)

您的自定义函数f似乎需要groupby

np.random.seed(100)
df2 = pd.DataFrame(np.random.randint(0,3,size=(10, 4)),columns=['ONE', 'TWO', 'CARS', 'FOUR'])
df2['NAMES'] = ['Peter','Jon','Mary','Mary','Peter','Peter','BONIFACE','Michael','Lucy','Gilari']
df2['CARS'] = ['Mercedes','BMW','Ford','BMW','BMW','Dacia','Ford','Pontiac','Chevrolet','Tesla']
print (df2)
   ONE  TWO       CARS  FOUR     NAMES
0    0    0   Mercedes     2     Peter
1    2    0        BMW     1       Jon
2    2    2       Ford     2      Mary
3    1    0        BMW     0      Mary
4    0    2        BMW     1     Peter
5    1    2      Dacia     0     Peter
6    0    1       Ford     1  BONIFACE
7    0    0    Pontiac     1   Michael
8    1    2  Chevrolet     2      Lucy
9    1    1      Tesla     2    Gilari
def f(x):
    if (x.name == 'BMW'):
        x.loc[x.ONE == 2, 'FOUR'] = x.TWO
    return x

agrupe = df2.groupby('CARS').apply(f)
print (agrupe)
   ONE  TWO       CARS  FOUR     NAMES
0    0    0   Mercedes     2     Peter
1    2    0        BMW     0       Jon
2    2    2       Ford     2      Mary
3    1    0        BMW     0      Mary
4    0    2        BMW     1     Peter
5    1    2      Dacia     0     Peter
6    0    1       Ford     1  BONIFACE
7    0    0    Pontiac     1   Michael
8    1    2  Chevrolet     2      Lucy
9    1    1      Tesla     2    Gilari

更好的解决方案是首先选择列CARSBMW且列ONE2的所有行,然后按列{{1}更改FOUR }:

TWO

如果需要更改df2.loc[(df2.CARS == 'BMW') & (df2.ONE == 2), 'FOUR'] = df2.TWO print (df2) ONE TWO CARS FOUR NAMES 0 0 0 Mercedes 2 Peter 1 2 0 BMW 0 Jon 2 2 2 Ford 2 Mary 3 1 0 BMW 0 Mary 4 0 2 BMW 1 Peter 5 1 2 Dacia 0 Peter 6 0 1 Ford 1 BONIFACE 7 0 0 Pontiac 1 Michael 8 1 2 Chevrolet 2 Lucy 9 1 1 Tesla 2 Gilari 列中的2,请按列ONE更改列FOUR

TWO