Question

我有一个文件，它有一些标题行，例如

header1 lines: somehting something
more headers then
somehting something
----

this is where the data starts
yes data... lots of foo barring bar fooing data.
...
...

我通过循环并运行file.readlines()跳过了标题行，除了循环和连接其余行之外，我还能如何阅读剩下的行？ < / p>

x = """header1 lines: somehting something
more headers then
somehting something
----

this is where the data starts
yes data... lots of foo barring bar fooing data.
...
..."""

with open('test.txt','w') as fout:
  print>>fout, x

fin = open('test.txt','r')
for _ in range(5): fin.readline();
rest = "\n".join([i for i in fin.readline()])

Answer 1

.readlines()一次性读取文件中的所有数据。第一次通话后没有更多的线路可供阅读。

您可能想要使用.readline()（无s，单数）：

with open('test.txt','r') as fin:
    for _ in range(5): fin.readline()
    rest = "\n".join(fin.readlines())

请注意，因为.readlines()已经返回了一个列表，所以您不需要遍历这些项目。您也可以使用.read()读取文件的其余部分：

with open('test.txt','r') as fin:
    for _ in range(5): fin.readline()
    rest = fin.read()

或者，将文件对象视为可迭代，并使用itertools.islice()切片将iterable跳过前五行：

from itertools import islice

with open('test.txt','r') as fin:
    all_but_the_first_five = list(islice(fin, 5, None))

这会生成行，而不是一个大字符串，但如果您逐行处理输入文件，那么通常最好。您可以直接在切片上循环并处理行：

with open('test.txt','r') as fin:
    for line in list(islice(fin, 5, None)):
        # process line, first 5 will have been skipped

不要将文件对象混合为可迭代的.readline();由文件对象实现的迭代协议使用内部缓冲区来确保.readline()不知道的效率;迭代后使用.readline()可能会在文件中进一步返回数据，而不是您期望的数据。

Answer 2

略过前5行：

from itertools import islice

with open('yourfile') as fin:
    data = list(islice(fin, 5, None))
    # or loop line by line still
    for line in islice(fin, 5, None):
        print line

如何阅读其余的行？ - 蟒蛇

2 个答案: