我尝试使用python3从一些固定宽度格式的文件(来自here)读取数据。
如果我预选几行就可以了,但如果我想去
通过孔文件(大约1000行和每行611块,4个字符= 2444个字符)python告诉我,struct.Struct(bytes).unpackFrom(bytes)
需要a buffer of at least 2444 bytes
,目前我不知道它为什么没有这么大的缓冲区
也许它有用,我在64位Linux上运行4 Gig RAM和20 Gig Swap。
代码段是这样的:
#edit
"""rowMask is 611 times 4s, just to prevent you from counting it... """
rowMask="4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s4s"
def readUsableFields(filename,stdPath):
usableFields=[]
with open(stdPath+filename,"r") as f:
count_line=0
for line in f:
count_col=0
fields=struct.Struct(bytes(rowMask,"UTF-8")).unpack_from(bytes(line,"UTF-8"))
for field in fields:
if(field!=-999):
usableFields.append([count_line,count_col])
count_col+=1
count_line+=1
return usableFields
一些帮助会很好,如果我的问题是重复的(我没找到),请告诉我。
答案 0 :(得分:0)
因为许多固定宽度的文件都有一个页脚(或标题)代码
会在页脚上失败,因为它可能没有合适的长度。
所以你必须检查正确的线长:
rowMask="4s"*611
def readUsableFields(filename,stdPath):
usableFields=[]
with open(stdPath+filename,"r") as f:
count_line=0
for line in f:
count_col=0
# len(line) = 611 * 4 +1
# as there is a trailing '\0'
if(len(line)!=2445):
continue
fields=struct.Struct(bytes(rowMask,"UTF-8")).unpack_from(bytes(line,"UTF-8"))
for field in fields:
if(field!=-999):
usableFields.append([count_line,count_col])
count_col+=1
count_line+=1
f.close()
return usableFields