我有一个标准格式的csv输入文件,带有一个我正在剥离的杂乱标题,然后是一个包含35列和8760行的数组。除第6列(文本)外,所有这些数据都是数字。我已尝试允许genfromtxt()
自行解决此问题,但最后该列转向nan
s,我相信因为没有引号。
目前,我正在按如下方式阅读此数组:
WeaData = np.genfromtxt(FileIn, delimiter=",", skip_header=8)
我尝试使用
手动指定列类型WeaData = np.genfromtxt(FileIn, delimiter=",", skip_header=8, dtype=(float,float,float,float,float,str,float,float,float,float,float,float,float,float,float,float,float,float,float,float,float,float,float,float,float,float,float,float,float,float,float,float,float,float,float))
和
WeaData = np.genfromtxt(FileIn, delimiter=",", skip_header=8, dtype=[float for n in range(5)]+['S10']+[float for n in range(29)])
但没有运气。我相信我的语法在第一个选项中是错误的,第二个选项返回一个空洞数组。有没有一种简单的方法可以做到这一点,最好不指定35种列类型?
以下是我的csv文件的三行供参考,在我不关心的标题之后。
1966,1,1,1,60,A7A7A7A7*0?0?0?0?0?0?0?0A7A7A7A7A7A7F8F8A7E7,3.9,1.7,86,102400,0,0,264,0,0,0,0,0,0,0,230,2.1,0,0,24.1,77777,0,999999999,8,0.1000,0,88,0.000,0.0,0.0
1966,1,1,2,60,A7A7A7A7*0?0?0?0?0?0?0?0A7A7A7A7A7A7F8F8A7E7,4.4,0.0,73,102500,0,0,265,0,0,0,0,0,0,0,270,3.6,0,0,24.1,77777,0,999999999,8,0.1000,0,88,0.000,0.0,0.0
1966,1,1,3,60,A7A7A7A7*0?0?0?0?0?0?0?0A7A7A7A7A7A7F8F8A7E7,2.8,-0.6,79,102500,0,0,258,0,0,0,0,0,0,0,310,2.1,0,0,24.1,77777,0,999999999,8,0.1000,0,88,0.000,0.0,0.0
我正在使用Python V2.7。
答案 0 :(得分:1)
使用numpy.loadtxt
和参数usecols
来仅选择包含浮点数的列。
>>> import numpy as np
>>> cols = range(0,5) + range(6,35)
>>> data = np.loadtxt("data.txt", delimiter=",", usecols=cols, dtype=np.float)
>>> data
[[ 1.96600000e+03 1.00000000e+00 1.00000000e+00 1.00000000e+00
6.00000000e+01 3.90000000e+00 1.70000000e+00 8.60000000e+01
1.02400000e+05 0.00000000e+00 0.00000000e+00 2.64000000e+02
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 2.30000000e+02
2.10000000e+00 0.00000000e+00 0.00000000e+00 2.41000000e+01
7.77770000e+04 0.00000000e+00 9.99999999e+08 8.00000000e+00
1.00000000e-01 0.00000000e+00 8.80000000e+01 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 1.96600000e+03 1.00000000e+00 1.00000000e+00 2.00000000e+00
6.00000000e+01 4.40000000e+00 0.00000000e+00 7.30000000e+01
1.02500000e+05 0.00000000e+00 0.00000000e+00 2.65000000e+02
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 2.70000000e+02
3.60000000e+00 0.00000000e+00 0.00000000e+00 2.41000000e+01
7.77770000e+04 0.00000000e+00 9.99999999e+08 8.00000000e+00
1.00000000e-01 0.00000000e+00 8.80000000e+01 0.00000000e+00
0.00000000e+00 0.00000000e+00]
[ 1.96600000e+03 1.00000000e+00 1.00000000e+00 3.00000000e+00
6.00000000e+01 2.80000000e+00 -6.00000000e-01 7.90000000e+01
1.02500000e+05 0.00000000e+00 0.00000000e+00 2.58000000e+02
0.00000000e+00 0.00000000e+00 0.00000000e+00 0.00000000e+00
0.00000000e+00 0.00000000e+00 0.00000000e+00 3.10000000e+02
2.10000000e+00 0.00000000e+00 0.00000000e+00 2.41000000e+01
7.77770000e+04 0.00000000e+00 9.99999999e+08 8.00000000e+00
1.00000000e-01 0.00000000e+00 8.80000000e+01 0.00000000e+00
0.00000000e+00 0.00000000e+00]]
如果要包含第6列,则必须将矩阵作为对象加载,不能将浮点数与字符串混合。
>>> data = np.loadtxt("data.txt", delimiter=",", dtype=np.object)
因此,如果您需要此列,请单独加载。