如何用float64 nan选择行?

时间:2018-11-06 08:06:22

标签: python python-3.x pandas dataframe

我有一个来自excel的数据框,该数据框在行中有多个NaN。我想用另一条基线行替换所有值为NaN的行。

原始数据帧如下:

                    Country Name  Years  tariff1_1  tariff1_2  tariff1_3  
830                 Hungary       2004   9.540313   6.287314  13.098201   
831                 Hungary       2005   9.540789   6.281724  13.124401 
832                 Hungary       2006   NaN        NaN       NaN 
833                 Hungary       2007   NaN        NaN       NaN 
834                 eu            2005   8.55       5.7       11.4
835                 eu            2006   8.46       5.9       11.6
836                 eu            2007   8.56       5.3       11.9

因此,如果特定年份对匈牙利的关税全部为NaN,则应根据确切年份用欧盟数据代替这一行。

理想的结果是:

                    Country Name  Years  tariff1_1  tariff1_2  tariff1_3  
830                 Hungary       2004   9.540313   6.287314  13.098201   
831                 Hungary       2005   9.540789   6.281724  13.124401 
832                 Hungary       2006   8.46       5.9       11.6 
833                 Hungary       2007   8.56       5.3       11.9
834                 eu            2005   8.55       5.7       11.4
835                 eu            2006   8.46       5.9       11.6
836                 eu            2007   8.56       5.3       11.9

我在特定的行中查看了NaN的类型('Hungary',2006),结果变成了'float64'。结果显示,输入类型不支持 ufunc'isnan',并且在使用 np.isnan 后,根据强制转换规则“ safe” ,不能将输入安全地强制转换为任何受支持的类型。

所以我采用了 math.isnan 。但是它似乎无法在我的测试行中检测到NaN

test=df.loc[(df['Country Name'] == 'Hungary') & (df['Years']== 2006)]

test.iloc[:,4]
Out[293]: 
832   NaN
Name: tariff1_3, dtype: float64

math.isnan(any(test))
Out[294]:False

np.isnan(any(test))
Out[295]:ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''

这是我的原始台词。

 Eu=['Austria','Belgium','Curacao','Denmark','Finland','France','Germany']

 for country in Eu:
        for year in range(2001,2012)
            if math.isnan(all(df.loc[(df['Country Name'] == country) & (df['Years'] == year)])):
                df.loc[(df['Country Name'] == country) & (df['Years'] == year)]=df.loc[(df['Country Name'] == 'eu') & (df['Years'] == year)]

谢谢!

2 个答案:

答案 0 :(得分:3)

如果仅需要转换NaNs行:

print (df)
    Country Name  Years  tariff1_1  tariff1_2  tariff1_3
830      Hungary   2004   9.540313   6.287314  13.098201
831      Hungary   2005        NaN   6.281724  13.124401
832      Hungary   2006        NaN        NaN        NaN
833      Hungary   2007        NaN        NaN        NaN
834           eu   2005   8.550000   5.700000  11.400000
835           eu   2006   8.460000   5.900000  11.600000
836           eu   2007   8.560000   5.300000  11.900000

Eu=['Austria','Belgium','Curacao','Denmark','Finland','France','Germany','Hungary']

#all columns without specified in list
cols = df.columns.difference(['Country Name','Years'])
#eu DataFrame for repalce missing rows
eu = df[df['Country Name'] == 'eu'].drop('Country Name', 1).set_index('Years')
print (eu)
       tariff1_1  tariff1_2  tariff1_3
Years                                 
2005        8.55        5.7       11.4
2006        8.46        5.9       11.6
2007        8.56        5.3       11.9

#filter only Eu countries and all missing values with columns cols 
mask = df['Country Name'].isin(Eu) & df[cols].isnull().all(axis=1)

#for filtered rows replace missing rows by fillna 
df.loc[mask, cols] = pd.DataFrame(df[mask].set_index('Years')
                                          .drop('Country Name', 1).fillna(eu).values,
                                  index=df.index[mask], columns=cols)
print (df)
    Country Name  Years  tariff1_1  tariff1_2  tariff1_3
830      Hungary   2004   9.540313   6.287314  13.098201
831      Hungary   2005        NaN   6.281724  13.124401
832      Hungary   2006   8.460000   5.900000  11.600000
833      Hungary   2007   8.560000   5.300000  11.900000
834           eu   2005   8.550000   5.700000  11.400000
835           eu   2006   8.460000   5.900000  11.600000
836           eu   2007   8.560000   5.300000  11.900000

答案 1 :(得分:1)

您可以尝试:

df.isnull().values.any()

针对您的情况:

test.isnull().values.any()