试图忽略csv文件中的Nan会引发类型错误

时间:2017-03-21 02:24:44

标签: python csv pandas numpy typeerror

我正在加载包含数据的本地csv文件。我试图找到NaN和数字混合的最小浮点数 我尝试使用名为np.nanmin的numpy函数,但它抛出:

  

" TypeError:'< =' ' str'的实例之间不支持并且'浮动'"

database = pd.read_csv('database.csv',quotechar='"',skipinitialspace=True, delimiter=',')

    coun_weight = database[['Country of Operator/Owner', 'Launch Mass (Kilograms)']]
    print(coun_weight)

lightest = np.nanmin(coun_weight['Launch Mass (Kilograms)'])

为什么nanmin可能不起作用的任何建议?先谢谢!

指向整个csv文件的链接:http://www.sharecsv.com/s/5aea6381d1debf75723a45aacd40abf8/database.csv
以下是我的coun_weight的样本:

                 Country of Operator/Owner Launch Mass (Kilograms)
1390                     China                     NaN
1391                     China                    1040
1392                     China                    1040
1393                     China                    2700
1394                     China                    2700
1395                     China                    1800
1396                     China                    2700
1397                     China                     NaN
1398                     China                     NaN
1399                     China                     NaN
1400                     China                     NaN
1401                     India                      92
1402                    Russia                      45
1403              South Africa                       1
1404                     China                     NaN
1405                     China                       4
1406                     China                       4
1407                     China                      12

2 个答案:

答案 0 :(得分:1)

尝试将列转换为float显式显示问题,您有“5,000+”但未转换为“float64”。

coun_weight['Launch Mass (Kilograms)'].astype('float64')

结果:

    ValueError: invalid literal for float(): 5,000+

答案 1 :(得分:1)

我尝试测试它,所有有问题的值都是:

coun_weight = pd.read_csv('database.csv')

print (coun_weight.loc[pd.to_numeric(coun_weight['Launch Mass (Kilograms)'], errors='coerce').isnull(), 'Launch Mass (Kilograms)'].dropna())
1091    5,000+
1092    5,000+
1093    5,000+
1094    5,000+
1096    5,000+
Name: Launch Mass (Kilograms), dtype: object

解决方案是:

coun_weight['Launch Mass (Kilograms)'] = 
coun_weight['Launch Mass (Kilograms)'].replace('5,000+', 5000).astype(float)

print (coun_weight['Launch Mass (Kilograms)'].iloc[1091:1098])
1091    5000.0
1092    5000.0
1093    5000.0
1094    5000.0
1095       NaN
1096    5000.0
1097    6500.0
Name: Launch Mass (Kilograms), dtype: float64

然后,如果需要找到NaN s - Series.min的最小值,则跳过NaN

print (coun_weight['Launch Mass (Kilograms)'].min())
0.0

测试某些0是否在列中:

a = coun_weight['Launch Mass (Kilograms)']
print (a[a == 0])
912    0.0
Name: Launch Mass (Kilograms), dtype: float64

另一种可能的解决方案是将此值替换为NaN s:

coun_weight['Launch Mass (Kilograms)'] = 
pd.to_numeric(coun_weight['Launch Mass (Kilograms)'], errors='coerce')

print (coun_weight['Launch Mass (Kilograms)'].iloc[1091:1098])
1091       NaN
1092       NaN
1093       NaN
1094       NaN
1095       NaN
1096       NaN
1097    6500.0
Name: Launch Mass (Kilograms), dtype: float64