我有一个简单的问题。导航到罚款中的某一行,然后删除所有内容。我使用合适的file.truncate()调用。但是,下面两段代码表现不同。
1)
with open(file, "a+b", 1) as f:
#Navigate to the MARKER
while True:
line = f.readline()
if MARKER in line:
f.truncate()
f.write(stuff)
break
2)
with open(file, "a+b", 1) as f:
#Navigate to the MARKER
for line in f:
if MARKER in line:
f.truncate()
f.write(stuff)
break
(1)表现如预期。但是在(2)的情况下,找到 MARKER后截断的几行的文件。我推测有一些缓冲正在进行,但正如你所看到的,我明确地将缓冲行为定义为open()调用的“行缓冲”。
有什么想法?我想使用更直观的“for line in file”语法......
答案 0 :(得分:3)
线索似乎在this bit of Python's C source中 - Python 2.7似乎为for line in file:
使用了8 KB的预读缓冲区。
答案 1 :(得分:2)
通常,for x in y
类型的语句要求y不要在循环内更改。你违反了合同。
答案 2 :(得分:2)
来自Python documentation,5。内置类型/ 5.9。文件对象:
为了使for循环最多 有效的循环方式 文件的行(非常常见 操作),next()方法使用a 隐藏的预读缓冲区。
顺便说一句:一般不鼓励使用关键字(例如file
)作为变量名称。
答案 3 :(得分:0)
这是因为'a'
模式:
一
开放追加(最后写作 的文件)。如果是,则创建该文件 不存在。 流是 位于文件末尾。
A +
开放阅读和追加 (在文件末尾写)。该文件是 如果它不存在则创建。该 用于阅读的初始文件位置是 在文件的开头,但是 输出始终附加到文件末尾。
我的上述答案是错误的。
我已经知道循环文件的行使用缓冲区的预读,但我相信 truncate()会触发文件的指针移动到文件的末尾,因为据我所知,截断文件包括编写一个称为EOF的小字节序列,意味着文件结束,而'a'
模式总是激发文件末尾的写入,无论位置是什么在写作之前的文件指针。
嗯,不是那样的,我应该通过执行代码来验证。所以我的答案值得投票。
但是在这种情况下,没有任何解释的贬低是破旧和令人沮丧的,在这种情况下,这个答案中的错误并不明显。
以下代码显示在 truncate()的操作之前,文件的指针未移动到文件的末尾。
为了清楚起见,文件'fileA'由长度为100个字符的行组成('\ r \ n包含),结尾就像那样('\ r \ n'在这里不可见):
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000100
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000200
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000300
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000400
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000500
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000600
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000700
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000800
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000900
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001000
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001100
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001200
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001300
............................
代码:
print '\n===================== 1 ==================\n'
from os.path import getsize
# length of ecr is 90 :
ecr = 10*'a' + 10*'b' + 10*'c' + 10*'d' + 10*'e' +\
10*'f' + 10*'g' + 10*'h' + 10*'i'
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
for i in xrange(100,53001,100): # 530 turns of iteration
f.write(ecr + str(i).zfill(8) + '\r\n')
# Length of each written line is 100 :
# 90 (ecr) + 8 (str(i).zfill(8)) + 2 ('\r\n')
# File's length will be 53000
print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
for line in g:
if '00000800' in line:
print repr(line[78:]),' g.tell()==',g.tell()
# at this point, 800 characters should have been read
# in the file if there wasn't a reading buffer
g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')
结果
===================== 1 ==================
size of fileA before truncating : 53000
'hhiiiiiiiiii00000800\r\n' g.tell()== 8192
size of fileA after truncating : 8192
因此,AKX和Fenisko在唤起缓冲区时是正确的(但是他们没有比我更多地测试这个假设)因为'a'
模式下文件的打开对的操作没有影响截断()即可。我认为这是文档的以下摘录中的上限句子所说的:
直到现在,我从未理解过这句话。file.truncate([size])截断 文件的大小。如果是可选尺寸 参数存在,文件是 截断到(最多)该大小。该 size默认为当前位置。 目前的文件位置不是 CHANGED
正如AKX所指出的那样,缓冲区的大小为 8192 ....一读。
但是对于下一个读数,缓冲区显然是 10240 个字符:
print '\n=================== 2 ====================\n'
from os.path import getsize
# length of ecr is 90 :
ecr = 10*'a' + 10*'b' + 10*'c' + 10*'d' + 10*'e' +\
10*'f' + 10*'g' + 10*'h' + 10*'i'
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
for i in xrange(100,53001,100): # 530 turns of iteration
f.write(ecr + str(i).zfill(8) + '\r\n')
# length of each written line is 100
# file's length will be 53000
print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
for line in g:
if '00008100' in line:
print repr(line[78:]),' g.tell()==',g.tell()
# at this point, 800 characters should have been read
# in the file if there wasn't a reading buffer
g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')
# -----------
print
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
for i in xrange(100,53001,100): # 530 turns of iteration
f.write(ecr + str(i).zfill(8) + '\r\n')
# length of each written line is 100
# file's length will be 53000
print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
for line in g:
if '00008200' in line:
print repr(line[78:]),' g.tell()==',g.tell()
# at this point, 800 characters should have been read
# in the file if there wasn't a reading buffer
g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')
# -----------
print
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
for i in xrange(100,53001,100): # 530 turns of iteration
f.write(ecr + str(i).zfill(8) + '\r\n')
# length of each written line is 100
# file's length will be 53000
print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
for line in g:
if '00018400' in line:
print repr(line[78:]),' g.tell()==',g.tell()
# at this point, 800 characters should have been read
# in the file if there wasn't a reading buffer
g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')
# -----------
print
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
for i in xrange(100,53001,100): # 530 turns of iteration
f.write(ecr + str(i).zfill(8) + '\r\n')
# length of each written line is 100
# file's length will be 53000
print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
for line in g:
if '00018500' in line:
print repr(line[78:]),' g.tell()==',g.tell()
# at this point, 800 characters should have been read
# in the file if there wasn't a reading buffer
g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')
结果
=================== 2 ====================
size of fileA before truncating : 53000
'hhiiiiiiiiii00008100\r\n' g.tell()== 8192
size of fileA after truncating : 8192
size of fileA before truncating : 53000
'hhiiiiiiiiii00008200\r\n' g.tell()== 18432
size of fileA after truncating : 18432
size of fileA before truncating : 53000
'hhiiiiiiiiii00018400\r\n' g.tell()== 18432
size of fileA after truncating : 18432
size of fileA before truncating : 53000
'hhiiiiiiiiii00018500\r\n' g.tell()== 28672
size of fileA after truncating : 28672
顺便说一句, truncate()不会关闭文件:
print '\n=================== 3 ====================\n'
from os.path import getsize
# length of ecr is 90 :
ecr = 10*'a' + 10*'b' + 10*'c' + 10*'d' + 10*'e' +\
10*'f' + 10*'g' + 10*'h' + 10*'i'
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
for i in xrange(100,53001,100): # 530 turns of iteration
f.write(ecr + str(i).zfill(8) + '\r\n')
# length of each written line is 100
# file's length will be 53000
print 'size of fileA before truncating : ',getsize('fileA.txt')
with open('fileA.txt','a+b') as g:
for line in g:
if '00000200' in line:
print repr(line[78:]),' g.tell()==',g.tell()
# at this point, 800 characters should have been read
# if there wasn't a buffer
g.truncate()
g.seek(6000,0)
k = 0
for li in g:
k+=1
print 'k==',k,' ',repr(li[-32:])
if k==7:
break
print 'size of fileA after truncating : ',getsize('fileA.txt')
结果
=================== 3 ====================
size of fileA before truncating : 53000
'hhiiiiiiiiii00000200\r\n' g.tell()== 8192
k== 1 'gghhhhhhhhhhiiiiiiiiii00006100\r\n'
k== 2 'gghhhhhhhhhhiiiiiiiiii00006200\r\n'
k== 3 'gghhhhhhhhhhiiiiiiiiii00006300\r\n'
k== 4 'gghhhhhhhhhhiiiiiiiiii00006400\r\n'
k== 5 'gghhhhhhhhhhiiiiiiiiii00006500\r\n'
k== 6 'gghhhhhhhhhhiiiiiiiiii00006600\r\n'
k== 7 'gghhhhhhhhhhiiiiiiiiii00006700\r\n'
size of fileA after truncating : 8192
但如果在 truncate()之后放置了写入指令,则程序的行为就会变得不连贯。试试吧。