我正在尝试从以下文件内容创建MX2 numpy矩阵或数组:
shell: head WORLD#America.csv
"2013-04-17 12","3","WORLD","#America"
"2013-04-17 13","9","WORLD","#America"
"2013-04-17 14","4","WORLD","#America"
"2013-04-17 15","3","WORLD","#America"
"2013-04-17 16","7","WORLD","#America"
"2013-04-17 17","8","WORLD","#America"
"2013-04-17 18","6","WORLD","#America"
"2013-04-17 19","6","WORLD","#America"
"2013-04-17 20","6","WORLD","#America"
"2013-04-17 21","2","WORLD","#America"
我遇到了genfromtxt()
函数,但在提取数据方面却没有成功。使用名为f
的文件,我尝试了以下操作:ts = genfromtxt(f, delimiter=",")
并使用nan
填充了一个数组。这只是第一次尝试,因此我阅读了有关dtype
参数的文档,该参数指定了数组的数据类型。看来要获得包含(datetime, int)
形式条目的MX2矩阵,我会得到以下内容:dtype=[('f1', datetime64), ('f2', uint)]
。当我这样做时,我将以下内容分配给变量ts
:
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L),
(datetime.datetime(1969, 12, 31, 23, 59, 59, 999999), 18446744073709551615L)],
dtype=[('f1', ('<M8[us]', {})), ('f2', '<u8')])
我为矩阵得到的每个值都是常数...为什么不从我的文件中读取? 显然这不是我应该得到的输出。
如何获得所需的MX2矩阵或数组,第一列为datetime
,第二列为整数,如head命令所示?
答案 0 :(得分:0)
正如评论中所指出的,使用genfromtxt
读取此文件的一个难点是引用字符的存在。也许最好只是(以编程方式)删除引号,但也可以围绕这个问题作弊:将引号字符指定为分隔符:
np.genfromtxt(filename, delimiter='"', dtype=str, comments=None)[0]
# array(['', '2013-04-17 12', ',', '3', ',', 'WORLD', ',', '#America', ''],
# dtype='|S13')
现在该文件被解释为有9列,其中第二列和第四列包含感兴趣的数据。
另一个问题是为日期时间列指定dtype。在Numpy的最近(?)版本中,您必须指定时间/日期单位或genfromtxt
引发错误。在这种情况下,显然您需要使用M8[h]
作为dtype,以指定每小时单位。
总而言之,我能够通过以下方式加载文件:
ts = np.genfromtxt(filename,
delimiter='"',
dtype='M8[h], uint',
usecols=[1,3])
或者,您可以查看using a converter或尝试the CSV reader from Pandas。