genfromtxt返回NaN行

时间:2014-11-21 16:50:57

标签: python csv numpy

我正在尝试用numpy读取csv文件,我有以下代码

from numpy import genfromtxt
data = genfromtxt(open('errerr.csv', "r"), names=True, delimiter=',')

以下内容

  (nan, nan, nan, nan, nan, nan, nan),
       (nan, nan, nan, nan, nan, nan, nan),
       (nan, nan, nan, nan, nan, nan, nan)], 
      dtype=[('name', '<f8'), ('severity', '<f8'), ('Message', '<f8'), ('AppDomainName', '<f8'), ('ProcessName', '<f8'), ('clientid', '<f8'), ('type', '<f8')])

dtype看起来很好

只是为了证明我不会发疯我试过这段代码

import csv
f = open('errors.csv', 'rt')
reader = csv.reader(f)
data = [] 
for r in reader: 
    data.append(r)
f.close()

效果很好,但我试图找出与genfromtxt的交易是什么

这是来自csv的样本

name,severity,Message,AppDomainName,ProcessName,clientid,type
 Strings strings,Error,")  Thread Name:  Extended Properties:",SunDSrvc.exe,C:\Program Files\\SunDSrvc.exe,5DAA9377 ,Client
 Strings strings,Error,")  Thread Name:  Extended Properties:",SunDSrvc.exe,C:\Program Files\\SunDSrvc.exe,5DAA9377 ,Client
 Strings strings,Error,")  Thread Name:  Extended Properties:",SunDSrvc.exe,C:\Program Files\\SunDSrvc.exe,5DAA9377 ,Client

2 个答案:

答案 0 :(得分:12)

您的dtype并不理想。它为每个字段指定'<f8',一个浮点数。你想要字符串。试试dtype=None

 np.genfromtxt(txt,delimiter=',',names=True,dtype=None)

产生:

array([ ('Strings strings', 'Error', '")  Thread Name:  Extended Properties:"', 'SunDSrvc.exe', 'C:\\Program Files\\SunDSrvc.exe', '5DAA9377 ', 'Client'),
       ('Strings strings', 'Error', '")  Thread Name:  Extended Properties:"', 'SunDSrvc.exe', 'C:\\Program Files\\SunDSrvc.exe', '5DAA9377 ', 'Client'),
       ('Strings strings', 'Error', '")  Thread Name:  Extended Properties:"', 'SunDSrvc.exe', 'C:\\Program Files\\SunDSrvc.exe', '5DAA9377 ', 'Client')], 
      dtype=[('name', 'S15'), ('severity', 'S5'), ('Message', 'S39'), ('AppDomainName', 'S12'), ('ProcessName', 'S29'), ('clientid', 'S9'), ('type', 'S6')])

(我删除了引号内分隔符的无关内容)

答案 1 :(得分:1)

您还应该添加encoding=None以避免出现不建议使用的警告:

VisibleDeprecationWarning:读取unicode字符串而不指定 编码参数已弃用。设置编码,将None用于 系统默认值。

您的行应类似于:

np.genfromtxt(txt, delimiter=',', names=True, dtype=None, encoding=None)