我的代码出现以下错误:数据类型“country”未被理解。我是python的新手,我基本上都在努力学习如何使用.csv文件。我正在使用python 3.4和编辑器Canopy。我试图将csv的数据类型格式化为字符串和浮点数,但是一旦我尝试将字符串类型分配给第一个数据列(col由单词 - country指向),我就会收到错误。我试图将国家分配到“a200”类型,相信可以是一个字符串。我在这做错了什么?请清楚,因为我是新人。
代码是这样的:
import csv
import numpy
def open_with_csv(filename):
data = []
with open(filename) as csvin:
file_reader = csv.reader(csvin, delimiter = ',')
for line in file_reader:
data.append(line)
return data
data_from_csv = open_with_csv('C:\Users\user\Desktop\MDR-TB_burden_estimates_2015-05-07.csv')
print (data_from_csv)
FIELDNAMES = ['country', 'iso2', 'iso3', 'iso_numeric', 'year', 'source_mdr_new', 'source_drs_coverage_new', 'source_drs_year_new', 'e_new_mdr_pcnt', 'e_new_mdr_pcnt_lo', 'e_new_mdr_pcnt_hi', 'e_new_mdr_num', 'e_new_mdr_num_lo', 'e_new_mdr_num_hi', 'source_mdr_ret', 'source_drs_coverage_ret', 'source_drs_year_ret', 'e_ret_mdr_pcnt', 'e_ret_mdr_pcnt_lo', 'e_ret_mdr_pcnt_hi', 'e_ret_mdr_num', 'e_ret_mdr_num_lo', 'e_ret_mdr_num_hi', 'e_mdr_num', 'e_mdr_num_lo', 'e_mdr_num_hi']
print (FIELDNAMES)
DATATYPES = [('country','a200'), ('iso2'), ('iso3'), ('iso_numeric'), ('year'), ('source_mdr_new'), ('source_drs_coverage_new'), ('source_drs_year_new'), ('e_new_mdr_pcnt'), ('e_new_mdr_pcnt_lo'), ('e_new_mdr_pcnt_hi'), ('e_new_mdr_num'), ('e_new_mdr_num_lo'), ('e_new_mdr_num_hi'), ('source_mdr_ret'), ('source_drs_coverage_ret'), ('source_drs_year_ret'), ('e_ret_mdr_pcnt'), ('e_ret_mdr_pcnt_lo'), ('e_ret_mdr_pcnt_hi'), ('e_ret_mdr_num'), ('e_ret_mdr_num_lo'), ('e_ret_mdr_num_hi'), ('e_mdr_num'), ('e_mdr_num_lo'), ('e_mdr_num_hi')]
def load_data(filename, d=','):
my_csv = numpy.genfromtxt(filename, delimiter=d, skip_header=1, invalid_raise=False, names= FIELDNAMES, dtype = DATATYPES)
return my_csv
my_csv = load_data('C:\Users\user\Desktop\MDR-TB_burden_estimates_2015-05-07.csv')
答案 0 :(得分:2)
看起来你传递给numpy.getfromtxt的参数格式不正确。
如果要将值传递给name和dtype参数,则需要将dtype指定为逗号分隔的字符串:“a200,i4等......”
或者,您可以传递元组(“name”,“type”)对的列表,而不指定names参数。
您可以在此处查看示例: http://docs.scipy.org/doc/numpy/reference/generated/numpy.genfromtxt.html
答案 1 :(得分:0)
我相信这会重现你的问题:
In [156]: txt=b"""USA, 123, ux345, 1.24
Canada, 434, xz3444, 3.34
France, 443, 2x453, 4.34
"""
In [157]: FIELDNAMES=['country','id','code','value']
In [158]: DATATYPES =[('country','a100'),('id'),('code'),('value')]
In [159]: np.genfromtxt(txt.splitlines(), delimiter=',',dtype=DATATYPES, names=FIELDNAMES)
...
--> 847 ndtype = np.dtype(dict(formats=ndtype, names=names))
848 else:
849 nbtypes = len(ndtype)
TypeError: data type "country" not understood
这就是你的主题来自哪里。出于某种原因,在解析DATATYPES
时,它认为'country'
表示一个dtype(与'S100','int'相同)。但你的意思是它是一个字段名称。
让我们更正DATATYPES
,并为每个字段提供一种类型,而不仅仅是第一种:
In [165]: DATATYPES =[('country','a100'),('id',int),('code','a5'),('value',float)]
In [166]: np.genfromtxt(txt.splitlines(), delimiter=',',dtype=DATATYPES, names=FIELDNAMES)
Out[166]:
array([(b'USA', 123, b' ux34', 1.24), (b'Canada', 434, b' xz34', 3.34),
(b'France', 443, b' 2x45', 4.34)],
dtype=[('country', 'S100'), ('id', '<i4'), ('code', 'S5'), ('value', '<f8')])
正如Serguei
所写,有几种方法可以指定名称和dtypes。是的,如果不清楚,请重新阅读genfromtxt
文档。 SO上还有很多genfromtxt
个问题和例子。