如何使用python一次从文件中读取两行

时间:2009-11-01 14:27:27

标签: python

我正在编写一个解析文本文件的python脚本。这个文本文件的格式是这样的,文件中的每个元素都使用两行,为方便起见,我想在解析之前读取这两行。可以用Python完成吗?

我想要一些像:

f = open(filename, "r")
for line in f:
    line1 = line
    line2 = f.readline()

f.close

但这打破说:

  

ValueError:混合迭代和读取方法会丢失数据

相关:

14 个答案:

答案 0 :(得分:45)

类似问题here。你不能混合迭代和readline,所以你需要使用其中一个。

while True:
    line1 = f.readline()
    line2 = f.readline()
    if not line2: break  # EOF
    ...

答案 1 :(得分:41)

import itertools
with open('a') as f:
    for line1,line2 in itertools.zip_longest(*[f]*2):
        print(line1,line2)

itertools.zip_longest()返回一个迭代器,因此即使文件长达数十亿行也能正常工作。

如果有奇数行,则line2在最后一次迭代时设置为None

在Python2上,您需要使用izip_longest代替。


在评论中,有人询问此解决方案是否首先读取整个文件,然后再次对文件进行迭代。 我相信它没有。 with open('a') as f行打开文件句柄,但不读取文件。 f是一个迭代器,因此在请求之前不会读取其内容。 zip_longest将迭代器作为参数,并返回一个迭代器。

zip_longest确实为同一个迭代器f提供了两次。但最终发生的事情是next(f)在第一个参数上调用,然后在第二个参数上调用。由于在同一个底层迭代器上调用next(),因此会产生连续的行。这与读取整个文件非常不同。实际上,使用迭代器的目的正是为了避免读取整个文件。

因此,我认为解决方案可以正常工作 - 文件只能通过for循环读取一次。

为了证实这一点,我运行了zip_longest解决方案而不是使用f.readlines()的解决方案。我在最后添加input()来暂停脚本,并在每个脚本上运行ps axuw

% ps axuw | grep zip_longest_method.py

unutbu 11119 2.2 0.2 4520 2712 pts/0 S+ 21:14 0:00 python /home/unutbu/pybin/zip_longest_method.py bigfile

% ps axuw | grep readlines_method.py

unutbu 11317 6.5 8.8 93908 91680 pts/0 S+ 21:16 0:00 python /home/unutbu/pybin/readlines_method.py bigfile

readlines一次清楚地读入整个文件。由于zip_longest_method使用的内存要少得多,我认为可以安全地得出结论:它不是一次读取整个文件。

答案 2 :(得分:22)

使用next(),例如

with open("file") as f:
    for line in f:
        print(line)
        nextline = next(f)
        print("next line", nextline)
        ....

答案 3 :(得分:11)

我会以与ghostdog74类似的方式继续进行,只是尝试外部和一些修改:

try:
    with open(filename) as f:
        for line1 in f:
            line2 = f.next()
            # process line1 and line2 here
except StopIteration:
    print "(End)" # do whatever you need to do with line1 alone

这使代码简单而强大。如果发生其他事情,使用with会关闭文件,或者只是在耗尽资源并退出循环后关闭资源。

请注意with在启用with_statement功能时需要2.6或2.5。

答案 4 :(得分:5)

这个怎么样,有人看到它的问题

with open('file_name') as f:
    for line1, line2 in zip(f, f):
        print(line1, line2)

答案 5 :(得分:4)

适用于偶数和奇数文件。它只是忽略了无与伦比的最后一行。

f=file("file")

lines = f.readlines()
for even, odd in zip(lines[0::2], lines[1::2]):
    print "even : ", even
    print "odd : ", odd
    print "end cycle"
f.close()

如果您有大文件,这不是正确的方法。您正在使用readlines()加载内存中的所有文件。我曾经写过一个读取文件的类,保存每行开头的fseek位置。这使您可以获得特定的行而不必将所有文件都存储在内存中,也可以前进和后退。

我把它贴在这里。许可证是公共域名,意思是,用它做你想做的事情。请注意,本课程已于6年前编写,我从未接触或检查过。我认为它甚至不符合文件。 警告经纪人。另请注意,这对您的问题来说太过分了。我并没有声称你应该这样做,但我有这个代码,如果你需要更复杂的访问,我喜欢分享它。

import string
import re

class FileReader:
    """ 
    Similar to file class, but allows to access smoothly the lines 
    as when using readlines(), with no memory payload, going back and forth,
    finding regexps and so on.
    """
    def __init__(self,filename): # fold>>
        self.__file=file(filename,"r")
        self.__currentPos=-1
        # get file length
        self.__file.seek(0,0)
        counter=0
        line=self.__file.readline()
        while line != '':
            counter = counter + 1
            line=self.__file.readline()
        self.__length = counter
        # collect an index of filedescriptor positions against
        # the line number, to enhance search
        self.__file.seek(0,0)
        self.__lineToFseek = []

        while True:
            cur=self.__file.tell()
            line=self.__file.readline()
            # if it's not null the cur is valid for
            # identifying a line, so store
            self.__lineToFseek.append(cur)
            if line == '':
                break
    # <<fold
    def __len__(self): # fold>>
        """
        member function for the operator len()
        returns the file length
        FIXME: better get it once when opening file
        """
        return self.__length
        # <<fold
    def __getitem__(self,key): # fold>>
        """ 
        gives the "key" line. The syntax is

        import FileReader
        f=FileReader.FileReader("a_file")
        line=f[2]

        to get the second line from the file. The internal
        pointer is set to the key line
        """

        mylen = self.__len__()
        if key < 0:
            self.__currentPos = -1
            return ''
        elif key > mylen:
            self.__currentPos = mylen
            return ''

        self.__file.seek(self.__lineToFseek[key],0)
        counter=0
        line = self.__file.readline()
        self.__currentPos = key
        return line
        # <<fold
    def next(self): # fold>>
        if self.isAtEOF():
            raise StopIteration
        return self.readline()
    # <<fold
    def __iter__(self): # fold>>
        return self
    # <<fold
    def readline(self): # fold>>
        """
        read a line forward from the current cursor position.
        returns the line or an empty string when at EOF
        """
        return self.__getitem__(self.__currentPos+1)
        # <<fold
    def readbackline(self): # fold>>
        """
        read a line backward from the current cursor position.
        returns the line or an empty string when at Beginning of
        file.
        """
        return self.__getitem__(self.__currentPos-1)
        # <<fold
    def currentLine(self): # fold>>
        """
        gives the line at the current cursor position
        """
        return self.__getitem__(self.__currentPos)
        # <<fold
    def currentPos(self): # fold>>
        """ 
        return the current position (line) in the file
        or -1 if the cursor is at the beginning of the file
        or len(self) if it's at the end of file
        """
        return self.__currentPos
        # <<fold
    def toBOF(self): # fold>>
        """
        go to beginning of file
        """
        self.__getitem__(-1)
        # <<fold
    def toEOF(self): # fold>>
        """
        go to end of file
        """
        self.__getitem__(self.__len__())
        # <<fold
    def toPos(self,key): # fold>>
        """
        go to the specified line
        """
        self.__getitem__(key)
        # <<fold
    def isAtEOF(self): # fold>>
        return self.__currentPos == self.__len__()
        # <<fold
    def isAtBOF(self): # fold>>
        return self.__currentPos == -1
        # <<fold
    def isAtPos(self,key): # fold>>
        return self.__currentPos == key
        # <<fold

    def findString(self, thestring, count=1, backward=0): # fold>>
        """
        find the count occurrence of the string str in the file
        and return the line catched. The internal cursor is placed
        at the same line.
        backward is the searching flow.
        For example, to search for the first occurrence of "hello
        starting from the beginning of the file do:

        import FileReader
        f=FileReader.FileReader("a_file")
        f.toBOF()
        f.findString("hello",1,0)

        To search the second occurrence string from the end of the
        file in backward movement do:

        f.toEOF()
        f.findString("hello",2,1)

        to search the first occurrence from a given (or current) position
        say line 150, going forward in the file 

        f.toPos(150)
        f.findString("hello",1,0)

        return the string where the occurrence is found, or an empty string
        if nothing is found. The internal counter is placed at the corresponding
        line number, if the string was found. In other case, it's set at BOF
        if the search was backward, and at EOF if the search was forward.

        NB: the current line is never evaluated. This is a feature, since
        we can so traverse occurrences with a

        line=f.findString("hello")
        while line == '':
            line.findString("hello")

        instead of playing with a readline every time to skip the current
        line.
        """
        internalcounter=1
        if count < 1:
            count = 1
        while 1:
            if backward == 0:
                line=self.readline()
            else:
                line=self.readbackline()

            if line == '':
                return ''
            if string.find(line,thestring) != -1 :
                if count == internalcounter:
                    return line
                else:
                    internalcounter = internalcounter + 1
                    # <<fold
    def findRegexp(self, theregexp, count=1, backward=0): # fold>>
        """
        find the count occurrence of the regexp in the file
        and return the line catched. The internal cursor is placed
        at the same line.
        backward is the searching flow.
        You need to pass a regexp string as theregexp.
        returns a tuple. The fist element is the matched line. The subsequent elements
        contains the matched groups, if any.
        If no match returns None
        """
        rx=re.compile(theregexp)
        internalcounter=1
        if count < 1:
            count = 1
        while 1:
            if backward == 0:
                line=self.readline()
            else:
                line=self.readbackline()

            if line == '':
                return None
            m=rx.search(line)
            if m != None :
                if count == internalcounter:
                    return (line,)+m.groups()
                else:
                    internalcounter = internalcounter + 1
    # <<fold
    def skipLines(self,key): # fold>>
        """
        skip a given number of lines. Key can be negative to skip
        backward. Return the last line read.
        Please note that skipLines(1) is equivalent to readline()
        skipLines(-1) is equivalent to readbackline() and skipLines(0)
        is equivalent to currentLine()
        """
        return self.__getitem__(self.__currentPos+key)
    # <<fold
    def occurrences(self,thestring,backward=0): # fold>>
        """
        count how many occurrences of str are found from the current
        position (current line excluded... see skipLines()) to the
        begin (or end) of file.
        returns a list of positions where each occurrence is found,
        in the same order found reading the file.
        Leaves unaltered the cursor position.
        """
        curpos=self.currentPos()
        list = []
        line = self.findString(thestring,1,backward)
        while line != '':
            list.append(self.currentPos())
            line = self.findString(thestring,1,backward)
        self.toPos(curpos)
        return list
        # <<fold
    def close(self): # fold>>
        self.__file.close()
    # <<fold

答案 6 :(得分:2)

file_name = 'your_file_name'
file_open = open(file_name, 'r')

def handler(line_one, line_two):
    print(line_one, line_two)

while file_open:
    try:
        one = file_open.next()
        two = file_open.next() 
        handler(one, two)
    except(StopIteration):
        file_open.close()
        break

答案 7 :(得分:2)

def readnumlines(file, num=2):
    f = iter(file)
    while True:
        lines = [None] * num
        for i in range(num):
            try:
                lines[i] = f.next()
            except StopIteration: # EOF or not enough lines available
                return
        yield lines

# use like this
f = open("thefile.txt", "r")
for line1, line2 in readnumlines(f):
    # do something with line1 and line2

# or
for line1, line2, line3, ..., lineN in readnumlines(f, N):
    # do something with N lines

答案 8 :(得分:1)

我的想法是创建一个一次从文件读取两行的生成器,并将其作为一个2元组返回,这意味着您可以迭代结果。

from cStringIO import StringIO

def read_2_lines(src):   
    while True:
        line1 = src.readline()
        if not line1: break
        line2 = src.readline()
        if not line2: break
        yield (line1, line2)


data = StringIO("line1\nline2\nline3\nline4\n")
for read in read_2_lines(data):
    print read

如果你有一个奇数行,它将无法正常工作,但这应该会给你一个很好的轮廓。

答案 9 :(得分:1)

我上个月也遇到过类似的问题。我尝试了一个带有f.readline()和f.readlines()的while循环。 我的数据文件并不大,所以我最终选择了f.readlines(),这使我能够更好地控制索引,否则 我必须使用f.seek()来回移动文件指针。

我的情况比OP复杂。因为我的数据文件在每次要解析的行数上更灵活,所以 在解析数据之前,我必须检查几个条件。

我发现f.seek()的另一个问题是,当我使用codecs.open('','r','utf-8')时,它不能很好地处理utf-8,(不完全是确定罪魁祸首,最终我放弃了这种方法。)

答案 10 :(得分:1)

简单的小读者。它将以两对的形式拉动线条,并在迭代对象时将它们作为元组返回。您可以手动关闭它,或者当它超出范围时自动关闭。

class doublereader:
    def __init__(self,filename):
        self.f = open(filename, 'r')
    def __iter__(self):
        return self
    def next(self):
        return self.f.next(), self.f.next()
    def close(self):
        if not self.f.closed:
            self.f.close()
    def __del__(self):
        self.close()

#example usage one
r = doublereader(r"C:\file.txt")
for a, h in r:
    print "x:%s\ny:%s" % (a,h)
r.close()

#example usage two
for x,y in doublereader(r"C:\file.txt"):
    print "x:%s\ny:%s" % (x,y)
#closes itself as soon as the loop goes out of scope

答案 11 :(得分:1)

f = open(filename, "r")
for line in f:
    line1 = line
    f.next()

f.close

现在,您可以每两行读取一次文件。如果您愿意,还可以在f.next()

之前检查f状态

答案 12 :(得分:0)

如果文件大小合适,另一种使用 list-comprehension 将整个文件读入 2元组列表的方法是:

filaname = '/path/to/file/name'

with open(filename, 'r') as f:
    list_of_2tuples = [ (line,f.readline()) for line in f ]

for (line1,line2) in list_of_2tuples: # Work with them in pairs.
    print('%s :: %s', (line1,line2))

答案 13 :(得分:-2)

这个Python代码将打印前两行:

import linecache  
filename = "ooxx.txt"  
print(linecache.getline(filename,2))