我正在处理客户邮政编码数据无效的一些数据。结果,我无法将CountryISOCode映射到他们的邮政编码,从而导致NaN。但是,我注意到,对于所有带有NaN的CountryISOCodes,CurrencyCode现在可以为我提供足够的解决此问题的方法。
我已经阅读了很多Stackoverflow文章,但是找不到解决我问题的方法。我已经尝试过...
def func(row):
if row['CountryISOCode'] == np.nan & row['Currency'] == 'EUR':
return 'IRE'
elif row['CountryISOCode'] == np.nan & row['Currency'] == 'GBP':
return 'GBR'
else:
return row['CountryISOCode']
df['CountryISOCode'] = df.apply(func, axis=1)
和其他一些方法,但无济于事...
下面,我提供了我正在使用的数据的副本
import pandas as pd
import numpy as np
data = [
['Steve', 'Invalid Postcode', 'GBP', np.nan ],
['Robyn', 'Invalid Postcode', 'EUR', np.nan],
['James', 'Valid Postcode', 'GBP', 'GBR'],
['Halo', 'Invalid Postcode', 'EUR', np.nan],
['Jesus', 'Valid Postcode', 'GBP', 'GBR']
]
df = pd.DataFrame(columns=["Name", "PostCode", "CurrencyCode", "CountryISOCode"], data=data)
基本上,如果我使用SQL,我的代码将如下所示。
IF countryISOCode IS NULL
AND currency = ‘GBP’
THEN CountryISOCode = ‘GBR’
ELSE
IF countryISOCode IS NULL
AND currency = ‘EUR
THEN CountryISOCode = ‘IRE’
ELSE countryISOCode
END
有什么想法吗?
答案 0 :(得分:2)
您可以为此使用np.select
,它允许您根据条件列表的结果从列表中进行选择:
m1 = df.CountryISOCode.isna()
m2 = df.CurrencyCode.eq('GBP')
m3 = df.CurrencyCode.eq('EUR')
df.loc[:,'CountryISOCode'] = np.select([m1&m2, m1&m3], ['GBP','IRE'],
default=df.CountryISOCode)
Name PostCode CurrencyCode CountryISOCode
0 Steve Invalid Postcode GBP GBP
1 Robyn Invalid Postcode EUR IRE
2 James Valid Postcode GBP GBR
3 Halo Invalid Postcode EUR IRE
4 Jesus Valid Postcode GBP GBR
答案 1 :(得分:2)
将np.select()
用于多种条件和多种选择:
df['CountryISOCode']=np.select([(df.CurrencyCode=='GBP')&(df.CountryISOCode.isna()),\
(df.CurrencyCode=='EUR')&df.CountryISOCode.isna()],['GBR','IRE'],\
default=df.CountryISOCode)
Name PostCode CurrencyCode CountryISOCode
0 Steve Invalid Postcode GBP GBR
1 Robyn Invalid Postcode EUR IRE
2 James Valid Postcode GBP GBR
3 Halo Invalid Postcode EUR IRE
4 Jesus Valid Postcode GBP GBR
答案 2 :(得分:2)
您可以将fillna
与字典配合使用,以指定货币代码何时有用的映射:
cmap = {'GBP': 'GBR', 'EUR': 'IRE'}
df['CountryISOCode'] = df['CountryISOCode'].fillna(df['CurrencyCode'].map(cmap))
print(df)
Name PostCode CurrencyCode CountryISOCode
0 Steve Invalid Postcode GBP GBR
1 Robyn Invalid Postcode EUR IRE
2 James Valid Postcode GBP GBR
3 Halo Invalid Postcode EUR IRE
4 Jesus Valid Postcode GBP GBR
答案 3 :(得分:1)
使用np.select
的其他答案有效时,我个人最喜欢使用mask
:
df['CountryISOCode'] = df['CountryISOCode'] \
.mask(df['CountryISOCode'].isna() & df['Currency'].eq('GBP'), 'GBR') \
.mask(df['CountryISOCode'].isna() & df['Currency'].eq('EUR'), 'IRE')
答案 4 :(得分:1)
我正在添加此答案,因为它为原始问题增加了价值。比较语句不起作用的原因是np.nan == np.nan
不起作用。您可以检查NaN元素的身份,但不能检查是否相等。有关更多详细信息,请参见in operator, float("NaN") and np.nan。话虽如此,这就是您可以如何转换原始代码以使其按预期工作的方式。
import pandas as pd
import numpy as np
raw_data = [
['Steve', 'Invalid Postcode', 'GBP', np.nan ],
['Robyn', 'Invalid Postcode', 'EUR', np.nan],
['James', 'Valid Postcode', 'GBP', 'GBR'],
['Halo', 'Invalid Postcode', 'EUR', np.nan],
['Jesus', 'Valid Postcode', 'GBP', 'GBR']
]
df = pd.DataFrame(columns=["Name", "PostCode", "Currency", "CountryISOCode"], data=raw_data)
def func(row):
if row['CountryISOCode'] is np.nan and row['Currency'] == 'EUR':
return 'IRE'
elif row['CountryISOCode'] is np.nan and row['Currency'] == 'GBP':
return 'GBR'
else:
return row['CountryISOCode']
df['CountryISOCode'] = df.apply(func, axis=1)
print(df)
但是,其他答案也很棒。