我有一个来自excel的数据框,该数据框在行中有多个NaN。我想用另一条基线行替换所有值为NaN的行。
原始数据帧如下:
Country Name Years tariff1_1 tariff1_2 tariff1_3
830 Hungary 2004 9.540313 6.287314 13.098201
831 Hungary 2005 9.540789 6.281724 13.124401
832 Hungary 2006 NaN NaN NaN
833 Hungary 2007 NaN NaN NaN
834 eu 2005 8.55 5.7 11.4
835 eu 2006 8.46 5.9 11.6
836 eu 2007 8.56 5.3 11.9
因此,如果特定年份对匈牙利的关税全部为NaN,则应根据确切年份用欧盟数据代替这一行。
理想的结果是:
Country Name Years tariff1_1 tariff1_2 tariff1_3
830 Hungary 2004 9.540313 6.287314 13.098201
831 Hungary 2005 9.540789 6.281724 13.124401
832 Hungary 2006 8.46 5.9 11.6
833 Hungary 2007 8.56 5.3 11.9
834 eu 2005 8.55 5.7 11.4
835 eu 2006 8.46 5.9 11.6
836 eu 2007 8.56 5.3 11.9
我在特定的行中查看了NaN的类型('Hungary',2006),结果变成了'float64'。结果显示,输入类型不支持 ufunc'isnan',并且在使用 np.isnan
后,根据强制转换规则“ safe” ,不能将输入安全地强制转换为任何受支持的类型。
所以我采用了 math.isnan
。但是它似乎无法在我的测试行中检测到NaN :
test=df.loc[(df['Country Name'] == 'Hungary') & (df['Years']== 2006)]
test.iloc[:,4]
Out[293]:
832 NaN
Name: tariff1_3, dtype: float64
math.isnan(any(test))
Out[294]:False
np.isnan(any(test))
Out[295]:ufunc 'isnan' not supported for the input types, and the inputs could not be safely coerced to any supported types according to the casting rule ''safe''
这是我的原始台词。
Eu=['Austria','Belgium','Curacao','Denmark','Finland','France','Germany']
for country in Eu:
for year in range(2001,2012)
if math.isnan(all(df.loc[(df['Country Name'] == country) & (df['Years'] == year)])):
df.loc[(df['Country Name'] == country) & (df['Years'] == year)]=df.loc[(df['Country Name'] == 'eu') & (df['Years'] == year)]
谢谢!
答案 0 :(得分:3)
如果仅需要转换NaNs行:
print (df)
Country Name Years tariff1_1 tariff1_2 tariff1_3
830 Hungary 2004 9.540313 6.287314 13.098201
831 Hungary 2005 NaN 6.281724 13.124401
832 Hungary 2006 NaN NaN NaN
833 Hungary 2007 NaN NaN NaN
834 eu 2005 8.550000 5.700000 11.400000
835 eu 2006 8.460000 5.900000 11.600000
836 eu 2007 8.560000 5.300000 11.900000
Eu=['Austria','Belgium','Curacao','Denmark','Finland','France','Germany','Hungary']
#all columns without specified in list
cols = df.columns.difference(['Country Name','Years'])
#eu DataFrame for repalce missing rows
eu = df[df['Country Name'] == 'eu'].drop('Country Name', 1).set_index('Years')
print (eu)
tariff1_1 tariff1_2 tariff1_3
Years
2005 8.55 5.7 11.4
2006 8.46 5.9 11.6
2007 8.56 5.3 11.9
#filter only Eu countries and all missing values with columns cols
mask = df['Country Name'].isin(Eu) & df[cols].isnull().all(axis=1)
#for filtered rows replace missing rows by fillna
df.loc[mask, cols] = pd.DataFrame(df[mask].set_index('Years')
.drop('Country Name', 1).fillna(eu).values,
index=df.index[mask], columns=cols)
print (df)
Country Name Years tariff1_1 tariff1_2 tariff1_3
830 Hungary 2004 9.540313 6.287314 13.098201
831 Hungary 2005 NaN 6.281724 13.124401
832 Hungary 2006 8.460000 5.900000 11.600000
833 Hungary 2007 8.560000 5.300000 11.900000
834 eu 2005 8.550000 5.700000 11.400000
835 eu 2006 8.460000 5.900000 11.600000
836 eu 2007 8.560000 5.300000 11.900000
答案 1 :(得分:1)
您可以尝试:
df.isnull().values.any()
针对您的情况:
test.isnull().values.any()