我正在使用genfromtxt导入具有各种数据类型的海量数据集。
我的原始代码运行良好(ucols是我要加载的列的列表):
data = np.genfromtxt(fname,comments = '#', skip_header=1, usecols=(ucols))
我的某些值是字符串,因此为了避免输入NaN,我尝试设置dtype = None
:
data = np.genfromtxt(fname, dtype = None,comments = '#', skip_header=1, usecols=(ucols))
现在由于某种原因,我仅获得一列数据,即第一列。有人可以解释我在做什么错吗?
编辑:我现在知道我应该获得一个一维结构化的数组,该数组可以被引用来获取整行值。但是我希望将我的数据作为一个numpy数组,是否可以将genfromtxt与dtype = None一起使用,并且仍然获得一个numpy数组而不是结构化数组,或者是否有一种在两者之间进行转换的快速方法。尽管第二种方法除非能快速有效地使用,否则是不可取的,因为我通常会移动比当前实例大得多的值。
答案 0 :(得分:1)
制作结构化数组并将其写入csv:
#This is the input in integer format
input_year = 2018
input_week = 29
#The general idea is that we will go down day by day from a reference date
#till we get the desired result.
#The loop is not computationally intensive since it will
#loop at max around 365 times.
#The program uses Python's ISO standard functions which considers Monday as
#the start of week.
ref_date = date(input_year+1,1,7) #approximation for starting point
#Reasoning behind arguments: Move to next year, January. Using 7 as day
#ensures that the calendar year has moved to the next year
#because as per ISO standard the first week starts in the week with Thursday
isoyear,isoweek,isoday = ref_date.isocalendar()
output_date = ref_date #initialize for loop
while True:
outisoyear,outisoweek,outisoday = output_date.isocalendar()
if outisoyear == input_year and outisoweek == input_week and outisoday == 1:
break
output_date = output_date + timedelta(days=-1)
print(output_date)
使用dtype = None加载所有列:
In [131]: arr=np.ones((3,), dtype='i,f,U10,i,f')
In [132]: arr['f2']=['a','bc','def']
In [133]: arr
Out[133]:
array([(1, 1., 'a', 1, 1.), (1, 1., 'bc', 1, 1.), (1, 1., 'def', 1, 1.)],
dtype=[('f0', '<i4'), ('f1', '<f4'), ('f2', '<U10'), ('f3', '<i4'), ('f4', '<f4')])
In [134]: np.savetxt('test',arr,fmt='%d,%e,%s,%d,%f')
In [135]: cat test
1,1.000000e+00,a,1,1.000000
1,1.000000e+00,bc,1,1.000000
1,1.000000e+00,def,1,1.000000
加载列的子集:
In [137]: np.genfromtxt('test',delimiter=',',dtype=None,encoding=None)
Out[137]:
array([(1, 1., 'a', 1, 1.), (1, 1., 'bc', 1, 1.), (1, 1., 'def', 1, 1.)],
dtype=[('f0', '<i8'), ('f1', '<f8'), ('f2', '<U3'), ('f3', '<i8'), ('f4', '<f8')])