根据多种条件替换熊猫数据框中的值

时间:2019-03-09 19:35:29

标签: python pandas numpy dataframe

基于此示例代码,我有一个相当简单的问题:

x1 = 10*np.random.randn(10,3)
df1 = pd.DataFrame(x1)
我正在寻找从df1派生的单个DataFrame,其中将正值替换为“ up”,将负值替换为“ down”,并将0值(如果有)替换为“零”。我尝试使用.where()和.mask()方法,但无法获得所需的结果

我看到了其他帖子可以同时根据多种条件进行过滤,但没有显示如何根据不同条件替换值

4 个答案:

答案 0 :(得分:4)

buckets

输出:

df1.apply(np.sign).replace({-1: 'down', 1: 'up', 0: 'zero'})

P.S。当然, 0 1 2 0 down up up 1 up down down 2 up down down 3 down down up 4 down down up 5 down up up 6 down up down 7 up down down 8 up up down 9 down up up 达到零的可能性很小

答案 1 :(得分:2)

如果条件为OR

from pandas import DataFrame

names = {'First_name': ['Jon','Bill','Maria','Emma']}

df = DataFrame(names,columns=['First_name'])

df.loc[(df['First_name'] == 'Bill') | (df['First_name'] == 'Emma'), 'name_match'] = 'Match'  
df.loc[(df['First_name'] != 'Bill') & (df['First_name'] != 'Emma'), 'name_match'] = 'Mismatch'
print (df)

输出

  First_name name_match
0        Jon   Mismatch
1       Bill      Match
2      Maria   Mismatch
3       Emma      Match

答案 2 :(得分:1)

通常,您可以在np.select上使用values并重新构建DataFrame

import pandas as pd
import numpy as np

df1 = pd.DataFrame(10*np.random.randn(10, 3))
df1.iloc[0, 0] = 0 # So we can check the == 0 condition 

conds = [df1.values < 0 , df1.values > 0]
choices = ['down', 'up']

pd.DataFrame(np.select(conds, choices, default='zero'),
             index=df1.index,
             columns=df1.columns)

输出:

      0     1     2
0  zero  down    up
1    up  down    up
2    up    up    up
3  down  down  down
4    up    up    up
5    up    up    up
6    up    up  down
7    up    up  down
8  down    up  down
9    up    up  down

答案 3 :(得分:1)

对于多个条件,即(df['employrate'] <=55) & (df['employrate'] > 50)

使用此:

df['employrate'] = np.where(
   (df['employrate'] <=55) & (df['employrate'] > 50) , 11, df['employrate']
   )

或者您也可以这样做,

gm.loc[(gm['employrate'] <55) & (gm['employrate'] > 50),'employrate']=11

这里的非正式语法可以是:

<dataset>.loc[<filter1> & (<filter2>),'<variable>']='<value>'

out[108]:
       country  employrate alcconsumption
0  Afghanistan   55.700001            .03
1      Albania   11.000000           7.29
2      Algeria   11.000000            .69
3      Andorra         nan          10.17
4       Angola   75.699997           5.57

因此我们在这里使用的语法是:

 df['<column_name>'] = np.where((<filter 1> ) & (<filter 2>) , <new value>, df['column_name'])

对于单个条件,即( 'employrate'] > 70 )

       country        employrate alcconsumption
0  Afghanistan  55.7000007629394            .03
1      Albania  51.4000015258789           7.29
2      Algeria              50.5            .69
3      Andorra                            10.17
4       Angola  75.6999969482422           5.57

使用此:

df.loc[df['employrate'] > 70, 'employrate'] = 7

       country  employrate alcconsumption
0  Afghanistan   55.700001            .03
1      Albania   51.400002           7.29
2      Algeria   50.500000            .69
3      Andorra         nan          10.17
4       Angola    7.000000           5.57

因此这里的语法是:

df.loc[<mask>(here mask is generating the labels to index) , <optional column(s)> ]