genfromtxt仅在更改dtype后导入第一列

时间:2018-07-19 20:25:05

标签: python python-3.x numpy

我正在使用genfromtxt导入具有各种数据类型的海量数据集。
我的原始代码运行良好(ucols是我要加载的列的列表):

data = np.genfromtxt(fname,comments = '#', skip_header=1, usecols=(ucols))

我的某些值是字符串,因此为了避免输入NaN,我尝试设置dtype = None

        data = np.genfromtxt(fname, dtype = None,comments = '#', skip_header=1, usecols=(ucols)) 

现在由于某种原因,我仅获得一列数据,即第一列。有人可以解释我在做什么错吗?

编辑:我现在知道我应该获得一个一维结构化的数组,该数组可以被引用来获取整行值。但是我希望将我的数据作为一个numpy数组,是否可以将genfromtxt与dtype = None一起使用,并且仍然获得一个numpy数组而不是结构化数组,或者是否有一种在两者之间进行转换的快速方法。尽管第二种方法除非能快速有效地使用,否则是不可取的,因为我通常会移动比当前实例大得多的值。

1 个答案:

答案 0 :(得分:1)

制作结构化数组并将其写入csv:

#This is the input in integer format
input_year = 2018
input_week = 29

#The general idea is that we will go down day by day from a reference date    
#till we get the desired result.
#The loop is not computationally intensive since it will 
#loop at max around 365 times.
#The program uses Python's ISO standard functions which considers Monday as 
#the start of week.

ref_date = date(input_year+1,1,7) #approximation for starting point
#Reasoning behind arguments: Move to next year, January. Using 7 as day    
#ensures that the calendar year has moved to the next year
#because as per ISO standard the first week starts in the week with Thursday

isoyear,isoweek,isoday = ref_date.isocalendar()
output_date = ref_date #initialize for loop
while True:
    outisoyear,outisoweek,outisoday = output_date.isocalendar()
    if outisoyear == input_year and outisoweek == input_week and outisoday == 1:
        break
    output_date = output_date + timedelta(days=-1)

print(output_date)

使用dtype = None加载所有列:

In [131]: arr=np.ones((3,), dtype='i,f,U10,i,f')
In [132]: arr['f2']=['a','bc','def']
In [133]: arr
Out[133]: 
array([(1, 1., 'a', 1, 1.), (1, 1., 'bc', 1, 1.), (1, 1., 'def', 1, 1.)],
      dtype=[('f0', '<i4'), ('f1', '<f4'), ('f2', '<U10'), ('f3', '<i4'), ('f4', '<f4')])
In [134]: np.savetxt('test',arr,fmt='%d,%e,%s,%d,%f')
In [135]: cat test
1,1.000000e+00,a,1,1.000000
1,1.000000e+00,bc,1,1.000000
1,1.000000e+00,def,1,1.000000

加载列的子集:

In [137]: np.genfromtxt('test',delimiter=',',dtype=None,encoding=None)
Out[137]: 
array([(1, 1., 'a', 1, 1.), (1, 1., 'bc', 1, 1.), (1, 1., 'def', 1, 1.)],
      dtype=[('f0', '<i8'), ('f1', '<f8'), ('f2', '<U3'), ('f3', '<i8'), ('f4', '<f8')])