基于此示例代码,我有一个相当简单的问题:
x1 = 10*np.random.randn(10,3)
df1 = pd.DataFrame(x1)
我正在寻找从df1派生的单个DataFrame,其中将正值替换为“ up”,将负值替换为“ down”,并将0值(如果有)替换为“零”。我尝试使用.where()和.mask()方法,但无法获得所需的结果
我看到了其他帖子可以同时根据多种条件进行过滤,但没有显示如何根据不同条件替换值
答案 0 :(得分:4)
buckets
输出:
df1.apply(np.sign).replace({-1: 'down', 1: 'up', 0: 'zero'})
P.S。当然, 0 1 2
0 down up up
1 up down down
2 up down down
3 down down up
4 down down up
5 down up up
6 down up down
7 up down down
8 up up down
9 down up up
达到零的可能性很小
答案 1 :(得分:2)
如果条件为OR
from pandas import DataFrame
names = {'First_name': ['Jon','Bill','Maria','Emma']}
df = DataFrame(names,columns=['First_name'])
df.loc[(df['First_name'] == 'Bill') | (df['First_name'] == 'Emma'), 'name_match'] = 'Match'
df.loc[(df['First_name'] != 'Bill') & (df['First_name'] != 'Emma'), 'name_match'] = 'Mismatch'
print (df)
输出
First_name name_match
0 Jon Mismatch
1 Bill Match
2 Maria Mismatch
3 Emma Match
答案 2 :(得分:1)
通常,您可以在np.select
上使用values
并重新构建DataFrame
import pandas as pd
import numpy as np
df1 = pd.DataFrame(10*np.random.randn(10, 3))
df1.iloc[0, 0] = 0 # So we can check the == 0 condition
conds = [df1.values < 0 , df1.values > 0]
choices = ['down', 'up']
pd.DataFrame(np.select(conds, choices, default='zero'),
index=df1.index,
columns=df1.columns)
0 1 2
0 zero down up
1 up down up
2 up up up
3 down down down
4 up up up
5 up up up
6 up up down
7 up up down
8 down up down
9 up up down
答案 3 :(得分:1)
对于多个条件,即(df['employrate'] <=55) & (df['employrate'] > 50)
使用此:
df['employrate'] = np.where(
(df['employrate'] <=55) & (df['employrate'] > 50) , 11, df['employrate']
)
或者您也可以这样做,
gm.loc[(gm['employrate'] <55) & (gm['employrate'] > 50),'employrate']=11
这里的非正式语法可以是:
<dataset>.loc[<filter1> & (<filter2>),'<variable>']='<value>'
out[108]:
country employrate alcconsumption
0 Afghanistan 55.700001 .03
1 Albania 11.000000 7.29
2 Algeria 11.000000 .69
3 Andorra nan 10.17
4 Angola 75.699997 5.57
因此我们在这里使用的语法是:
df['<column_name>'] = np.where((<filter 1> ) & (<filter 2>) , <new value>, df['column_name'])
对于单个条件,即( 'employrate'] > 70 )
country employrate alcconsumption
0 Afghanistan 55.7000007629394 .03
1 Albania 51.4000015258789 7.29
2 Algeria 50.5 .69
3 Andorra 10.17
4 Angola 75.6999969482422 5.57
使用此:
df.loc[df['employrate'] > 70, 'employrate'] = 7
country employrate alcconsumption
0 Afghanistan 55.700001 .03
1 Albania 51.400002 7.29
2 Algeria 50.500000 .69
3 Andorra nan 10.17
4 Angola 7.000000 5.57
因此这里的语法是:
df.loc[<mask>(here mask is generating the labels to index) , <optional column(s)> ]