从txt文件中读取一定数量的行并以pythonic方式转换为列表

时间:2013-02-01 16:30:09

标签: python arrays file-io idl-programming-language

假设我有以下txt文件:

0.0163934
6
7.52438e+09
2147483648
6.3002e-06 6.31527e-08 0 0 6 0 0 4.68498e-06 0.00638412 12.6688
6.33438e-06 0 5.99588e-09 0 0 0 0 4.70195e-06 0 12.876
6.36874e-06 0 6.09398e-09 0 0 0 0 4.71894e-06 0 13.0867
6.40329e-06 0 6.19369e-09 0 0 0 0 4.73593e-06 0 13.3009
6.43802e-06 0 6.29503e-09 0 0 0 0 4.75294e-06 0 13.5185
6.47295e-06 0 6.39803e-09 0 0 0 0 4.76996e-06 0 13.7397
0.0163934
3
7.52438e+09
2147483648
6.3002e-06 0 5.89935e-09 0 0 0 0 4.68498e-06 0 12.6688
6.33438e-06 0 5.99588e-09 0 0 0 0 4.70195e-06 0 12.876
6.36874e-06 0 6.09398e-09 0 0 0 0 4.71894e-06 0 13.0867

我想将每个第一行读作浮点数或整数,然后根据第二行我想将其余行读作列表或数组列表。

在IDL语言中,我只需要这样做:

openr, 1, fname
readf, 1, Time
readf, 1, Bins
readf, 1, dummy
readf, 1, dummyLong
da1= fltarr(10, Bins)
readf, 1, da1

这样整个数字块就存储在整数da1中,其大小为:10 * Bins。 (行和列与python相反)

然后我可以用同样的方式阅读以下几行。

在python中我正在做:

Time=float(filen.readline())
Bins=int(filen.readline())
dummy=float(filen.readline())
dummyLong=long(filen.readline())

lines=[filen.readline() for i in range(Bins)]

arra=[[float(x) for x in lines[i].split()] for i in range(len(lines))]

所以我需要两行代码和复杂的迭代,初学者无法理解。

有没有办法在IDL,单个语句和pythonic中做到这一点?

谢谢!

4 个答案:

答案 0 :(得分:1)

单行不一定比双行更好。

但你可以这样做:

arra = [[float(x) for x in filen.readline().split()] for _ in range(Bins)]

我更喜欢两行:

lines = (filen.readline() for _ in range(Bins))
arra = [[float(x) for x in line.split()] for line in lines]

答案 1 :(得分:1)

Time=float(fname.readline())
Bins=int(fname.readline())
dummy=float(fname.readline())
dummyLong=long(fname.readline())
arra = [ [ float(num) for num in line.split() ] for line in filen ]

这只是稍微多一点Pythonic,但它并没有停止读取所需的行数后,它只是读取所有行。您可以使用islice中的itertools来停止迭代,或者您之后可以简单地截断列表。

这是一个例子,因为我已经在使用islice,所以我冒昧地使用函数式编程...

from itertools import islice

CONVERTORS = (float, int, float, long, )
with open(...) as filen:
    Time, Bins, dummy, dummyLong = [ func(value) for func, value in zip(CONVERTORS, islice(filen, 4)) ]
    arra = [ map(float, line.split()) for line in islice(filen, Bins) ]

答案 2 :(得分:1)

这是一种更加面向对象的方法,使用简单编码的FSM(有限状态机)来控制完整数据记录中的读取过程。当前发布的其他答案更加冗长,但它是一种相当灵活和可扩展的方式来处理此类任务,并通过错误检查来完成。

class Record(object):
    def __init__(self, time=None, bins=None, fltarr=None):
        self.time = time
        self.bins = bins
        self.fltarr = fltarr

    def read(self, file):
        """ Read complete record from file into self and return True,
            otherwise return False if EOF encountered """
        START, STOP, EOF = 0, -1, -99

        state = START
        while state not in (EOF, STOP):
            line = file.readline()
            if not line: state = EOF; break
            # process line depending on read state
            if state == 0:
                self.time = float(line)
                state = 1
            elif state == 1:
                self.bins = int(line)
                state = 2
            elif state in (2, 3):
                # ignore line
                state += 1
            elif state == 4:
                self.fltarr = []
                last_bin = self.bins-1
                for bin in xrange(self.bins):
                    self.fltarr.append([float(x) for x in line.split()])
                    if bin == last_bin: break
                    line = file.readline()
                    if not line: state = EOF; break
                if state != EOF:
                    state = STOP

        return state == STOP

    def __str__(self):
        result = 'Record(time={}, bins={}, fltarr=[\n'.format(self.time, self.bins)
        for floats in self.fltarr:
            result += '  {}\n'.format(floats)
        return result + '])'

fname = 'sample_data.txt'
with open(fname, 'r') as input:
    data = []
    while True:
        record = Record()
        if not record.read(input):
            break
        else:
            data.append(record)

for record in data:
    print record

输出:

Record(time=0.0163934, bins=6, fltarr=[
  [6.3002e-06, 6.31527e-08, 0.0, 0.0, 6.0, 0.0, 0.0, 4.68498e-06, 0.00638412, 12.6688]
  [6.33438e-06, 0.0, 5.99588e-09, 0.0, 0.0, 0.0, 0.0, 4.70195e-06, 0.0, 12.876]
  [6.36874e-06, 0.0, 6.09398e-09, 0.0, 0.0, 0.0, 0.0, 4.71894e-06, 0.0, 13.0867]
  [6.40329e-06, 0.0, 6.19369e-09, 0.0, 0.0, 0.0, 0.0, 4.73593e-06, 0.0, 13.3009]
  [6.43802e-06, 0.0, 6.29503e-09, 0.0, 0.0, 0.0, 0.0, 4.75294e-06, 0.0, 13.5185]
  [6.47295e-06, 0.0, 6.39803e-09, 0.0, 0.0, 0.0, 0.0, 4.76996e-06, 0.0, 13.7397]
])
Record(time=0.0163934, bins=3, fltarr=[
  [6.3002e-06, 0.0, 5.89935e-09, 0.0, 0.0, 0.0, 0.0, 4.68498e-06, 0.0, 12.6688]
  [6.33438e-06, 0.0, 5.99588e-09, 0.0, 0.0, 0.0, 0.0, 4.70195e-06, 0.0, 12.876]
  [6.36874e-06, 0.0, 6.09398e-09, 0.0, 0.0, 0.0, 0.0, 4.71894e-06, 0.0, 13.0867]
])

答案 3 :(得分:0)

你也可以使用像

这样的numpy loadtxt
from numpy import loadtxt
data = loadtxt("input.txt", unpack=False)

然后根据需要转换数据类型

或者,也可以使用readlines:

from numpy import fromstring
fin = open("filename.dat")
data = fin.readlines()
Bins = -3
for record range(no_of_records):
    i = record + 3 + Bins
    Time = float(data[i])
    Bins = int(data[i+1])
    dummy, dummylong = (float(data[i+2]),float(data[i+3]))
    Bins = [fromstring(data(i+4+j), dtype=float, sep=" ") for j in range(Bins)]