我正在尝试使用“来自itertools import islice”,以便使用liblas模块从* .las文件一次读取多行。 (我的目标是阅读chunk-bychunk)
关注问题:Python how to read N number of lines at a time
islice()可用于获取迭代器的下n个项。从而, list(islice(f,n))将返回文件的下n行的列表 F。在循环内部使用它将为您提供n个块的文件 线。在文件的末尾,列表可能会更短,最后 该调用将返回一个空列表。
我使用了以下代码:
from numpy import nonzero
from liblas import file as lasfile
from itertools import islice
chunkSize = 1000000
f = lasfile.File(inFile,None,'r') # open LAS
while True:
chunk = list(islice(f,chunkSize))
if not chunk:
break
# do other stuff
但我有这个问题:
len(f)
2866390
chunk = list(islice(f, 1000000))
len(chunk)
**1000000**
chunk = list(islice(f, 1000000))
len(chunk)
**1000000**
chunk = list(islice(f, 1000000))
len(chunk)
**866390**
chunk = list(islice(f, 1000000))
len(chunk)
**1000000**
当文件f到达时,islice重新启动以读取文件。
感谢您的任何建议和帮助。非常感谢
答案 0 :(得分:2)
似乎很容易编写一个生成器一次产生n行:
def n_line_iterator(fobj,n):
if n < 1:
raise ValueError("Must supply a positive number of lines to read")
out = []
num = 0
for line in fobj:
if num == n:
yield out #yield 1 chunk
num = 0
out = []
out.append(line)
num += 1
yield out #need to yield the rest of the lines
答案 1 :(得分:2)
更改属于liblas包的file.py
的源代码。目前__iter__
定义为(src on github)
def __iter__(self):
"""Iterator support (read mode only)
>>> points = []
>>> for i in f:
... points.append(i)
... print i # doctest: +ELLIPSIS
<liblas.point.Point object at ...>
"""
if self.mode == 0:
self.at_end = False
p = core.las.LASReader_GetNextPoint(self.handle)
while p and not self.at_end:
yield point.Point(handle=p, copy=True)
p = core.las.LASReader_GetNextPoint(self.handle)
if not p:
self.at_end = True
else:
self.close()
self.open()
你看到当文件结束时它会被关闭并再次打开,所以迭代会在文件的开头再次开始。
尝试在while之后删除最后一个else
块,因此该方法的正确代码应为:
def __iter__(self):
"""Iterator support (read mode only)
>>> points = []
>>> for i in f:
... points.append(i)
... print i # doctest: +ELLIPSIS
<liblas.point.Point object at ...>
"""
if self.mode == 0:
self.at_end = False
p = core.las.LASReader_GetNextPoint(self.handle)
while p and not self.at_end:
yield point.Point(handle=p, copy=True)
p = core.las.LASReader_GetNextPoint(self.handle)
if not p:
self.at_end = True