我有一些数据 - 所有数据都是非负面的。 Numpy说它的总和是南,但我不相信它。这是我的解释:
首先,我读了培训数据:
dataframe = pandas.read_csv( "buggy.csv" )
training = dataframe.ix[:,dataframe.columns != "Survived"].values.astype( np.float32 )
训练功能存储在numpy数组中。我将前61行加起来并将其加到第62行的总和中:
sum1 = training[0:61][:].sum()
sum2 = training[62][:].sum()
print sum1 + sum2
我得到以下输出:5788.54
我总结前62行:
print training[0:62][:].sum()
我得到以下输出:nan
为什么我会在第二次总结时得到nan?我的所有数据都是非负面的,所以我不认为数字的顺序很重要。在此先感谢您的帮助。
(另外,这是来自anaconda 4.0.4的python 2.7)
以下是完整代码:
import numpy as np
import pandas
dataframe = pandas.read_csv( "buggy.csv" )
training = dataframe.ix[:,dataframe.columns != "Survived"].values.astype( np.float32 )
labels = dataframe[ "Survived" ].values.astype( np.float32 )
sum1 = training[0:61][:].sum()
sum2 = training[62][:].sum()
print sum1 + sum2
print training[0:62][:].sum()
这是重现问题所需的最小数据(只需将其复制粘贴到名为" buggy.csv"的文件中):
,Survived,Pclass,Sex,Age,SibSp,Parch,Fare,Embarked
0,0,3,0,22.0,1,0,7.25,2.0
1,1,1,1,38.0,1,0,71.2833,0.0
2,1,3,1,26.0,0,0,7.925,2.0
3,1,1,1,35.0,1,0,53.1,2.0
4,0,3,0,35.0,0,0,8.05,2.0
5,0,3,0,29.6991176471,0,0,8.4583,1.0
6,0,1,0,54.0,0,0,51.8625,2.0
7,0,3,0,2.0,3,1,21.075,2.0
8,1,3,1,27.0,0,2,11.1333,2.0
9,1,2,1,14.0,1,0,30.0708,0.0
10,1,3,1,4.0,1,1,16.7,2.0
11,1,1,1,58.0,0,0,26.55,2.0
12,0,3,0,20.0,0,0,8.05,2.0
13,0,3,0,39.0,1,5,31.275,2.0
14,0,3,1,14.0,0,0,7.8542,2.0
15,1,2,1,55.0,0,0,16.0,2.0
16,0,3,0,2.0,4,1,29.125,1.0
17,1,2,0,29.6991176471,0,0,13.0,2.0
18,0,3,1,31.0,1,0,18.0,2.0
19,1,3,1,29.6991176471,0,0,7.225,0.0
20,0,2,0,35.0,0,0,26.0,2.0
21,1,2,0,34.0,0,0,13.0,2.0
22,1,3,1,15.0,0,0,8.0292,1.0
23,1,1,0,28.0,0,0,35.5,2.0
24,0,3,1,8.0,3,1,21.075,2.0
25,1,3,1,38.0,1,5,31.3875,2.0
26,0,3,0,29.6991176471,0,0,7.225,0.0
27,0,1,0,19.0,3,2,263.0,2.0
28,1,3,1,29.6991176471,0,0,7.8792,1.0
29,0,3,0,29.6991176471,0,0,7.8958,2.0
30,0,1,0,40.0,0,0,27.7208,0.0
31,1,1,1,29.6991176471,1,0,146.5208,0.0
32,1,3,1,29.6991176471,0,0,7.75,1.0
33,0,2,0,66.0,0,0,10.5,2.0
34,0,1,0,28.0,1,0,82.1708,0.0
35,0,1,0,42.0,1,0,52.0,2.0
36,1,3,0,29.6991176471,0,0,7.2292,0.0
37,0,3,0,21.0,0,0,8.05,2.0
38,0,3,1,18.0,2,0,18.0,2.0
39,1,3,1,14.0,1,0,11.2417,0.0
40,0,3,1,40.0,1,0,9.475,2.0
41,0,2,1,27.0,1,0,21.0,2.0
42,0,3,0,29.6991176471,0,0,7.8958,0.0
43,1,2,1,3.0,1,2,41.5792,0.0
44,1,3,1,19.0,0,0,7.8792,1.0
45,0,3,0,29.6991176471,0,0,8.05,2.0
46,0,3,0,29.6991176471,1,0,15.5,1.0
47,1,3,1,29.6991176471,0,0,7.75,1.0
48,0,3,0,29.6991176471,2,0,21.6792,0.0
49,0,3,1,18.0,1,0,17.8,2.0
50,0,3,0,7.0,4,1,39.6875,2.0
51,0,3,0,21.0,0,0,7.8,2.0
52,1,1,1,49.0,1,0,76.7292,0.0
53,1,2,1,29.0,1,0,26.0,2.0
54,0,1,0,65.0,0,1,61.9792,0.0
55,1,1,0,29.6991176471,0,0,35.5,2.0
56,1,2,1,21.0,0,0,10.5,2.0
57,0,3,0,28.5,0,0,7.2292,0.0
58,1,2,1,5.0,1,2,27.75,2.0
59,0,3,0,11.0,5,2,46.9,2.0
60,0,3,0,22.0,0,0,7.2292,0.0
61,1,1,1,38.0,0,0,80.0,
62,0,1,0,45.0,1,0,83.475,2.0
答案 0 :(得分:4)
你正在跳过第61行,这是有问题的。 training[0:61][:].sum()
不包括第61行。
training[61]
Out[10]: array([ 61., 1., 1., 38., 0., 0., 80., nan], dtype=float32)
缺少最后一列,它只有7个值。