我的pandas DataFrame中有一个带有国家/地区名称的列。我想使用if-else条件在该列上应用不同的过滤器,并且必须使用这些条件在该DataFrame上添加一个新列。
当前数据框:-
Company Country
BV Denmark
BV Sweden
DC Norway
BV Germany
BV France
DC Croatia
BV Italy
DC Germany
BV Austria
BV Spain
我已经尝试过了,但是在此过程中,我不得不一次又一次地定义国家。
bookings_d2.loc [(bookings_d2.Country =='丹麦')| (bookings_d2.Country =='挪威'),'国家'] = bookings_d2.Country
在R中,我目前正在使用if else条件,例如,我想在python中实现同样的功能。
R代码示例1:
ifelse(bookings_d2 $ COUNTRY_NAME%in%c('丹麦','德国','挪威','瑞典','法国','意大利','西班牙','德国','奥地利','荷兰', '克罗地亚','比利时'),
as.character(bookings_d2 $ COUNTRY_NAME),“其他”)
R代码示例2:
ifelse(bookings_d2 $ country%in%c('Germany'),
ifelse(bookings_d2 $ BOOKING_BRAND%in%c('BV'),'Germany_BV','Germany_DC'),bookings_d2 $ country)
预期的DataFrame:-
Company Country
BV Denmark
BV Sweden
DC Norway
BV Germany_BV
BV France
DC Croatia
BV Italy
DC Germany_DC
BV Others
BV Others
答案 0 :(得分:2)
不确定您要实现的目标是什么,但是我想这与以下内容类似:
df=pd.DataFrame({'country':['Sweden','Spain','China','Japan'], 'continent':[None] * 4})
country continent
0 Sweden None
1 Spain None
2 China None
3 Japan None
df.loc[(df.country=='Sweden') | ( df.country=='Spain'), 'continent'] = "Europe"
df.loc[(df.country=='China') | ( df.country=='Japan'), 'continent'] = "Asia"
country continent
0 Sweden Europe
1 Spain Europe
2 China Asia
3 Japan Asia
您还可以像这样使用python列表理解:
df.continent=["Europe" if (x=="Sweden" or x=="Denmark") else "Other" for x in df.country]
答案 1 :(得分:1)
您可以获取它:
country_others=['Poland','Switzerland']
df.loc[df['Country']=='Germany','Country']=df.loc[df['Country']=='Germany'].apply(lambda x: x+df['Company'])['Country']
df.loc[(df['Company']=='DC') &(df['Country'].isin(country_others)),'Country']='Others'
答案 2 :(得分:1)
您可以使用:
例如1:将Series.isin
与numpy.where
或loc
一起使用,但必须用~
反转掩码:
#removed Austria, Spain
L = ['Denmark','Germany','Norway','Sweden','France','Italy',
'Germany','Netherlands','Croatia','Belgium']
df['Country'] = np.where(df['Country'].isin(L), df['Country'], 'Others')
替代:
df.loc[~df['Country'].isin(L), 'Country'] ='Others'
例如,2:使用numpy.select
或嵌套的np.where
:
m1 = df['Country'] == 'Germany'
m2 = df['Company'] == 'BV'
df['Country'] = np.select([m1 & m2, m1 & ~m2],['Germany_BV','Germany_DC'], df['Country'])
替代:
df['Country'] = np.where(~m1, df['Country'],
np.where(m2, 'Germany_BV','Germany_DC'))
print (df)
Company Country
0 BV Denmark
1 BV Sweden
2 DC Norway
3 BV Germany_BV
4 BV France
5 DC Croatia
6 BV Italy
7 DC Germany_DC
8 BV Others
9 BV Others