Question

我有一个看起来像这样的文本文件

# a   b   c   d    e    f    g

  1.0 3.0 6.0 4.0  5.0  9.0  4.0
  1.2 7.0 6.0 4.3  5.0  8.0  7.8
  1.7 8.0 6.4 4.1  8.7  9.9  4.7

  1.3 3.0 6.1 4.0  5.0  9.0  4.8
  1.5 3.2 6.3 4.0  5.7  9.0  4.5
  1.7 2.0 8.5 4.0  5.3  9.0  4.3
  1.7 3.2 8.0 4.0  5.1  9.0  4.3

  1.0 3.0 6.0 4.0  4.0  9.0  9.1
  1.3 3.1 6.8 4.0  5.5  9.0  5.0
  1.0 3.5 6.1 4.0  5.7  9.0  4.6

模式（由换行符分隔的数据块数对于每个文件是可变的，并且每个块的行数也是可变的）。逐列读取数据最简洁的方法是什么，但是将每个列变量分成不同的块？到目前为止，我只读了前两列：

  A = []
  B = []
  a = []
  b = []

  col_a = 0
  col_b = 1

  with open(fileName, 'r') as fid:
      header = fid.readline()
      next(fid)
      for line in fid:
          d = line.split()
          if not d: # If a newline
              A.append(a)
              B.append(b)
              a = []
              b = []

          if d: # If not a newline
              a.append(d[col_a])
              b.append(d[col_b])

Answer 1

您可以使用Python的csv库和itertools groupby函数执行此操作。该脚本创建一个块列表，每个块包含一列列：

from itertools import groupby
import csv

blocks = []

with open('input.txt', 'rb') as f_input:
    csv_input = csv.reader(f_input, delimiter=' ', skipinitialspace=True)
    header = next(csv_input)

    for k, g in groupby(csv_input, lambda x: len(x)):
        if k:
            blocks.append(zip(*g))

for block in blocks:
    print block

它打印以下内容：

[('1.0', '1.2', '1.7'), ('3.0', '7.0', '8.0'), ('6.0', '6.0', '6.4'), ('4.0', '4.3', '4.1'), ('5.0', '5.0', '8.7'), ('9.0', '8.0', '9.9'), ('4.0', '7.8', '4.7')]
[('1.3', '1.5', '1.7', '1.7'), ('3.0', '3.2', '2.0', '3.2'), ('6.1', '6.3', '8.5', '8.0'), ('4.0', '4.0', '4.0', '4.0'), ('5.0', '5.7', '5.3', '5.1'), ('9.0', '9.0', '9.0', '9.0'), ('4.8', '4.5', '4.3', '4.3')]
[('1.0', '1.3', '1.0'), ('3.0', '3.1', '3.5'), ('6.0', '6.8', '6.1'), ('4.0', '4.0', '4.0'), ('4.0', '5.5', '5.7'), ('9.0', '9.0', '9.0'), ('9.1', '5.0', '4.6')]

zip(*....)行会将您的行列表转换为列列表。

例如，要显示第1列中的第2列，您可以执行以下操作：

print blocks[0][1]

显示：

('3.0', '7.0', '8.0')

Answer 2

我不确定你希望如何将每列中的数据分开，但这是一种方法。处理完文件后，columns将是元组列表的列表，其中第2列的数据将在columns[1]中（因为Python从零开始计数）和包含第一个值块的元组列将位于columns[1][0]中。在示例数据文件中，这些文件是'3.0'，'7.0'和'8.0'。

代码已编写为在Python 2.6+以及Python 3.x中工作。

import csv
import io  # requires Python 2.6+

filename = 'many_values_per_line.txt'

with io.open(filename, 'r', newline='') as f_input:
    csv_input = csv.reader(f_input, delimiter=' ', skipinitialspace=True)
    header = next(csv_input)
    blocks = []
    block = []
    for row in csv_input:
        if row:
            block.append(row)
        elif block:
            blocks.append(zip(*block))
            block = []
    if block:  # final block
        blocks.append(zip(*block))
    columns = [list(i) for i in zip(*blocks)]

for i, column in enumerate(columns):
    print('column {}: {}'.format(i, column))
print('')
print('columns[1][0]: {}'.format(columns[1][0]))

输出：

column 0: [('1.0', '1.2', '1.7'), ('1.3', '1.5', '1.7', '1.7'), ('1.0', '1.3', '1.0')]
column 1: [('3.0', '7.0', '8.0'), ('3.0', '3.2', '2.0', '3.2'), ('3.0', '3.1', '3.5')]
column 2: [('6.0', '6.0', '6.4'), ('6.1', '6.3', '8.5', '8.0'), ('6.0', '6.8', '6.1')]
column 3: [('4.0', '4.3', '4.1'), ('4.0', '4.0', '4.0', '4.0'), ('4.0', '4.0', '4.0')]
column 4: [('5.0', '5.0', '8.7'), ('5.0', '5.7', '5.3', '5.1'), ('4.0', '5.5', '5.7')]
column 5: [('9.0', '8.0', '9.9'), ('9.0', '9.0', '9.0', '9.0'), ('9.0', '9.0', '9.0')]
column 6: [('4.0', '7.8', '4.7'), ('4.8', '4.5', '4.3', '4.3'), ('9.1', '5.0', '4.6')]

columns[1][0]: ('3.0', '7.0', '8.0')

如何读取由换行符分隔的多个值

2 个答案: