我正在阅读标签分隔的文件。我想自动推断列的类型,然后读取第一列的所有成员。
但是,类型numpy输出可能会有很大差异,具体取决于输入(单列或多列)和使用的dtype
参数。
以下是一组测试用例
multi_col = "a\t1\t2\nb\t3\t4"
a = np.genfromtxt(StringIO(multi_col), delimiter="\t", skip_header=0,
dtype=[('f0', '|S16'), ('f1', '<i8'), ('f2', '<i8')])
print_properties(a, 'a')
print("first col a['f0'] = " + str(a['f0']) + '\n')
a = np.genfromtxt(StringIO(multi_col), delimiter="\t", skip_header=0, dtype=None)
print_properties(a, 'a')
print("first col a['f0'] = " + str(a['f0']) + '\n')
a = np.genfromtxt(StringIO(multi_col), delimiter="\t", skip_header=0, dtype=str)
print_properties(a, 'a')
print('first col a[:, 0] = ' + str(a[:, 0]) + '\n')
single_col = "a\nb"
b = np.genfromtxt(StringIO(single_col), delimiter="\t", skip_header=0, dtype=[('f0', '|S16')])
print_properties(b, 'b')
print("first col b['f0'] = " + str(b['f0']) + '\n')
b = np.genfromtxt(StringIO(single_col), delimiter="\t", skip_header=0, dtype=None)
print_properties(b, 'b')
print('already a column\n')
b = np.genfromtxt(StringIO(single_col), delimiter="\t", skip_header=0, dtype=str)
print_properties(b, 'b')
print('already a column\n')
single_val = "a\n"
c = np.genfromtxt(StringIO(single_val), delimiter="\t", skip_header=0, dtype=str)
print_properties(c, 'c')
print('single value\n')
我找不到以统一方式访问第一列的方法。不同的情况需要不同的操作员。
如果我在前两种情况下使用col0 = a[:, 0]
,我会收到此错误
IndexError: too many indices for array
另一方面,如果我使用col0 = a['f0']
,则在第三种情况下会出错
IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices
是否有统一的方法来访问第一列而不进行类型转换?
答案 0 :(得分:0)
这是“为我工作”的解决方案。
def print_1st_col(arr):
# access the first column according to the different structures numpy can output
if arr.dtype.fields is not None:
print('structured array')
col = arr['f0'] # structured array
else:
if arr.ndim <= 1:
print('uniform 1D array')
col = arr # direct access (single number or uniform 1D array)
else:
print('uniform multi-dimensional array')
col = arr[:, 0] # uniform multi-dimensional array
print(str(col) + '\n')