我有一个看起来像这样的文本文件
# a b c d e f g
1.0 3.0 6.0 4.0 5.0 9.0 4.0
1.2 7.0 6.0 4.3 5.0 8.0 7.8
1.7 8.0 6.4 4.1 8.7 9.9 4.7
1.3 3.0 6.1 4.0 5.0 9.0 4.8
1.5 3.2 6.3 4.0 5.7 9.0 4.5
1.7 2.0 8.5 4.0 5.3 9.0 4.3
1.7 3.2 8.0 4.0 5.1 9.0 4.3
1.0 3.0 6.0 4.0 4.0 9.0 9.1
1.3 3.1 6.8 4.0 5.5 9.0 5.0
1.0 3.5 6.1 4.0 5.7 9.0 4.6
模式(由换行符分隔的数据块数对于每个文件是可变的,并且每个块的行数也是可变的)。逐列读取数据最简洁的方法是什么,但是将每个列变量分成不同的块?到目前为止,我只读了前两列:
A = []
B = []
a = []
b = []
col_a = 0
col_b = 1
with open(fileName, 'r') as fid:
header = fid.readline()
next(fid)
for line in fid:
d = line.split()
if not d: # If a newline
A.append(a)
B.append(b)
a = []
b = []
if d: # If not a newline
a.append(d[col_a])
b.append(d[col_b])
答案 0 :(得分:1)
您可以使用Python的csv
库和itertools groupby
函数执行此操作。该脚本创建一个块列表,每个块包含一列列:
from itertools import groupby
import csv
blocks = []
with open('input.txt', 'rb') as f_input:
csv_input = csv.reader(f_input, delimiter=' ', skipinitialspace=True)
header = next(csv_input)
for k, g in groupby(csv_input, lambda x: len(x)):
if k:
blocks.append(zip(*g))
for block in blocks:
print block
它打印以下内容:
[('1.0', '1.2', '1.7'), ('3.0', '7.0', '8.0'), ('6.0', '6.0', '6.4'), ('4.0', '4.3', '4.1'), ('5.0', '5.0', '8.7'), ('9.0', '8.0', '9.9'), ('4.0', '7.8', '4.7')]
[('1.3', '1.5', '1.7', '1.7'), ('3.0', '3.2', '2.0', '3.2'), ('6.1', '6.3', '8.5', '8.0'), ('4.0', '4.0', '4.0', '4.0'), ('5.0', '5.7', '5.3', '5.1'), ('9.0', '9.0', '9.0', '9.0'), ('4.8', '4.5', '4.3', '4.3')]
[('1.0', '1.3', '1.0'), ('3.0', '3.1', '3.5'), ('6.0', '6.8', '6.1'), ('4.0', '4.0', '4.0'), ('4.0', '5.5', '5.7'), ('9.0', '9.0', '9.0'), ('9.1', '5.0', '4.6')]
zip(*....)
行会将您的行列表转换为列列表。
例如,要显示第1列中的第2列,您可以执行以下操作:
print blocks[0][1]
显示:
('3.0', '7.0', '8.0')
答案 1 :(得分:1)
我不确定你希望如何将每列中的数据分开,但这是一种方法。处理完文件后,columns
将是元组列表的列表,其中第2列的数据将在columns[1]
中(因为Python从零开始计数)和包含第一个值块的元组列将位于columns[1][0]
中。在示例数据文件中,这些文件是'3.0'
,'7.0'
和'8.0'
。
代码已编写为在Python 2.6+以及Python 3.x中工作。
import csv
import io # requires Python 2.6+
filename = 'many_values_per_line.txt'
with io.open(filename, 'r', newline='') as f_input:
csv_input = csv.reader(f_input, delimiter=' ', skipinitialspace=True)
header = next(csv_input)
blocks = []
block = []
for row in csv_input:
if row:
block.append(row)
elif block:
blocks.append(zip(*block))
block = []
if block: # final block
blocks.append(zip(*block))
columns = [list(i) for i in zip(*blocks)]
for i, column in enumerate(columns):
print('column {}: {}'.format(i, column))
print('')
print('columns[1][0]: {}'.format(columns[1][0]))
输出:
column 0: [('1.0', '1.2', '1.7'), ('1.3', '1.5', '1.7', '1.7'), ('1.0', '1.3', '1.0')]
column 1: [('3.0', '7.0', '8.0'), ('3.0', '3.2', '2.0', '3.2'), ('3.0', '3.1', '3.5')]
column 2: [('6.0', '6.0', '6.4'), ('6.1', '6.3', '8.5', '8.0'), ('6.0', '6.8', '6.1')]
column 3: [('4.0', '4.3', '4.1'), ('4.0', '4.0', '4.0', '4.0'), ('4.0', '4.0', '4.0')]
column 4: [('5.0', '5.0', '8.7'), ('5.0', '5.7', '5.3', '5.1'), ('4.0', '5.5', '5.7')]
column 5: [('9.0', '8.0', '9.9'), ('9.0', '9.0', '9.0', '9.0'), ('9.0', '9.0', '9.0')]
column 6: [('4.0', '7.8', '4.7'), ('4.8', '4.5', '4.3', '4.3'), ('9.1', '5.0', '4.6')]
columns[1][0]: ('3.0', '7.0', '8.0')