Question

我正在尝试将许多文本文件聚合到一个csv文件中，但是截至目前，我很难加载一个文本文件。主要原因是每一列的间距都是可变的，因此没有制表符分隔符和逗号分隔符。我的带有数据的文本文件如下所示，除了成千上万的条目。我用A-M表示长度可变的列名，并描述了其下的数据类型：

A        B      C        D       E      F      G     H   I   J    K  L   M

S10     i8      i8      i8      S10    S2     i8    i8  i8  i8   i8  S1 f8

列的间距是我的问题所在。我尝试了以下方法：

file='example.txt'
col_locations = np.array([1, 34, 41, 52, 75, 79, 88, 99, 104, 109, 116, 121, 126])
col_locations = col_locations - 1

widths = col_locations[1:] - col_locations[:-1]
widths = np.insert(widths, 0, 1)
datatype =[('A', 'S10'), ('B', 'i8'), ('C', 'i8'), ('D', 'S10'), ('E', 'S2'), ('F', 'i8'), ('G', 'i8'), ('H', 'i8'), ('I', 'i8'), ('J', 'i8'), ('K', 'i8'), ('L', 'S1'), ('M', 'f8')]
data  = np.genfromtxt(file, skip_header = 10, delimiter = widths, autostrip = False, dtype = datatype)

从编辑器中选择了列位置的地方，效率不高，因为某些文本文件可能具有略有不同的列位置。我没有收到错误，但是当我打印（数据）时，显然没有正确加载。即使是这样，我也不会对这种方法感到满意。任何建议将不胜感激。谢谢。

python中是否存在识别可变宽度定界符的函数？

0 个答案: