我有一个文件,它有一些标题行,例如
header1 lines: somehting something
more headers then
somehting something
----
this is where the data starts
yes data... lots of foo barring bar fooing data.
...
...
我通过循环并运行file.readlines()
跳过了标题行,除了循环和连接其余行之外,我还能如何阅读剩下的行? < / p>
x = """header1 lines: somehting something
more headers then
somehting something
----
this is where the data starts
yes data... lots of foo barring bar fooing data.
...
..."""
with open('test.txt','w') as fout:
print>>fout, x
fin = open('test.txt','r')
for _ in range(5): fin.readline();
rest = "\n".join([i for i in fin.readline()])
答案 0 :(得分:3)
.readlines()
一次性读取文件中的所有数据。第一次通话后没有更多的线路可供阅读。
您可能想要使用.readline()
(无s
,单数):
with open('test.txt','r') as fin:
for _ in range(5): fin.readline()
rest = "\n".join(fin.readlines())
请注意,因为.readlines()
已经返回了一个列表,所以您不需要遍历这些项目。您也可以使用.read()
读取文件的其余部分:
with open('test.txt','r') as fin:
for _ in range(5): fin.readline()
rest = fin.read()
或者,将文件对象视为可迭代,并使用itertools.islice()
切片将iterable跳过前五行:
from itertools import islice
with open('test.txt','r') as fin:
all_but_the_first_five = list(islice(fin, 5, None))
这会生成行,而不是一个大字符串,但如果您逐行处理输入文件,那么通常最好。您可以直接在切片上循环并处理行:
with open('test.txt','r') as fin:
for line in list(islice(fin, 5, None)):
# process line, first 5 will have been skipped
不要将文件对象混合为可迭代的.readline()
;由文件对象实现的迭代协议使用内部缓冲区来确保.readline()
不知道的效率;迭代后使用.readline()
可能会在文件中进一步返回数据,而不是您期望的数据。
答案 1 :(得分:1)
略过前5行:
from itertools import islice
with open('yourfile') as fin:
data = list(islice(fin, 5, None))
# or loop line by line still
for line in islice(fin, 5, None):
print line