由于不同的“循环风格”导致的不同行为

时间:2011-04-12 06:54:50

标签: python file io

我有一个简单的问题。导航到罚款中的某一行,然后删除所有内容。我使用合适的file.truncate()调用。但是,下面两段代码表现不同。

1)

with open(file, "a+b", 1) as f:
  #Navigate to the MARKER
  while True:
    line = f.readline()
    if MARKER in line:
      f.truncate()
      f.write(stuff)
      break

2)

with open(file, "a+b", 1) as f:
  #Navigate to the MARKER
  for line in f:
    if MARKER in line:
      f.truncate()
      f.write(stuff)
      break

(1)表现如预期。但是在(2)的情况下,找到 MARKER后截断的几行的文件。我推测有一些缓冲正在进行,但正如你所看到的,我明确地将缓冲行为定义为open()调用的“行缓冲”。

有什么想法?我想使用更直观的“for line in file”语法......

4 个答案:

答案 0 :(得分:3)

线索似乎在this bit of Python's C source中 - Python 2.7似乎为for line in file:使用了8 KB的预读缓冲区。

答案 1 :(得分:2)

通常,for x in y类型的语句要求y不要在循环内更改。你违反了合同。

答案 2 :(得分:2)

来自Python documentation,5。内置类型/ 5.9。文件对象:

  

为了使for循环最多   有效的循环方式   文件的行(非常常见   操作),next()方法使用a   隐藏的预读缓冲区。

顺便说一句:一般不鼓励使用关键字(例如file)作为变量名称。

答案 3 :(得分:0)

这是因为'a'模式:

  

     

开放追加(最后写作   的文件)。如果是,则创建该文件   不存在。 流是   位于文件末尾

     

A +

     

开放阅读和追加   (在文件末尾写)。该文件是   如果它不存在则创建。该   用于阅读的初始文件位置是   在文件的开头,但是   输出始终附加到文件末尾

     

http://linux.die.net/man/3/fopen

修改

我的上述答案是错误的。

我已经知道循环文件的行使用缓冲区的预读,但我相信 truncate()会触发文件的指针移动到文件的末尾,因为据我所知,截断文件包括编写一个称为EOF的小字节序列,意味着文件结束,而'a'模式总是激发文件末尾的写入,无论位置是什么在写作之前的文件指针。

嗯,不是那样的,我应该通过执行代码来验证。所以我的答案值得投票。

但是在这种情况下,没有任何解释的贬低是破旧和令人沮丧的,在这种情况下,这个答案中的错误并不明显。

以下代码显示在 truncate()的操作之前,文件的指针未移动到文件的末尾。

为了清楚起见,文件'fileA'由长度为100个字符的行组成('\ r \ n包含),结尾就像那样('\ r \ n'在这里不可见):

....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000100
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000200
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000300
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000400
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000500
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000600
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000700
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000800
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00000900
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001000
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001100
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001200
....ffffffgggggggggghhhhhhhhhhiiiiiiiiii00001300
............................

代码:

print '\n===================== 1 ==================\n'

from os.path import getsize

# length of ecr is 90 :
ecr = 10*'a' + 10*'b' + 10*'c' + 10*'d' + 10*'e' +\
      10*'f' + 10*'g' + 10*'h' + 10*'i'


# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
    for i in xrange(100,53001,100): # 530 turns of iteration
        f.write(ecr + str(i).zfill(8) + '\r\n')
        # Length of each written line is 100 :
        # 90 (ecr) + 8 (str(i).zfill(8)) + 2 ('\r\n')
        # File's length will be 53000


print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
    for line in g:
        if '00000800' in line:
            print repr(line[78:]),'  g.tell()==',g.tell()
            # at this point, 800 characters should have been read
            # in the file if there wasn't a reading buffer
            g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')

结果

===================== 1 ==================

size of fileA before truncating :  53000
'hhiiiiiiiiii00000800\r\n'   g.tell()== 8192
size of fileA after truncating :  8192

因此,AKX和Fenisko在唤起缓冲区时是正确的(但是他们没有比我更多地测试这个假设)因为'a'模式下文件的打开对的操作没有影响截断()即可。我认为这是文档的以下摘录中的上限句子所说的:

  

file.truncate([size])截断   文件的大小。如果是可选尺寸   参数存在,文件是   截断到(最多)该大小。该   size默认为当前位置。   目前的文件位置不是   CHANGED

     

http://docs.python.org/library/stdtypes.html#file.truncate

直到现在,我从未理解过这句话。

正如AKX所指出的那样,缓冲区的大小为 8192 ....一读。

但是对于下一个读数,缓冲区显然是 10240 个字符:

print '\n=================== 2 ====================\n'

from os.path import getsize

# length of ecr is 90 :
ecr = 10*'a' + 10*'b' + 10*'c' + 10*'d' + 10*'e' +\
      10*'f' + 10*'g' + 10*'h' + 10*'i'


# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
    for i in xrange(100,53001,100): # 530 turns of iteration
        f.write(ecr + str(i).zfill(8) + '\r\n')
        # length of each written line is 100
        # file's length will be 53000


print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
    for line in g:
        if '00008100' in line:
            print repr(line[78:]),'  g.tell()==',g.tell()
            # at this point, 800 characters should have been read
            # in the file if there wasn't a reading buffer
            g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')

# -----------

print
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
    for i in xrange(100,53001,100): # 530 turns of iteration
        f.write(ecr + str(i).zfill(8) + '\r\n')
        # length of each written line is 100
        # file's length will be 53000


print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
    for line in g:
        if '00008200' in line:
            print repr(line[78:]),'  g.tell()==',g.tell()
            # at this point, 800 characters should have been read
            # in the file if there wasn't a reading buffer
            g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')

# -----------

print
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
    for i in xrange(100,53001,100): # 530 turns of iteration
        f.write(ecr + str(i).zfill(8) + '\r\n')
        # length of each written line is 100
        # file's length will be 53000


print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
    for line in g:
        if '00018400' in line:
            print repr(line[78:]),'  g.tell()==',g.tell()
            # at this point, 800 characters should have been read
            # in the file if there wasn't a reading buffer
            g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')

# -----------

print
# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
    for i in xrange(100,53001,100): # 530 turns of iteration
        f.write(ecr + str(i).zfill(8) + '\r\n')
        # length of each written line is 100
        # file's length will be 53000


print 'size of fileA before truncating : ',getsize('fileA.txt')
# truncating file at uncontroled position
with open('fileA.txt','a+b') as g:
    for line in g:
        if '00018500' in line:
            print repr(line[78:]),'  g.tell()==',g.tell()
            # at this point, 800 characters should have been read
            # in the file if there wasn't a reading buffer
            g.truncate()
print 'size of fileA after truncating : ',getsize('fileA.txt')

结果

=================== 2 ====================

size of fileA before truncating :  53000
'hhiiiiiiiiii00008100\r\n'   g.tell()== 8192
size of fileA after truncating :  8192

size of fileA before truncating :  53000
'hhiiiiiiiiii00008200\r\n'   g.tell()== 18432
size of fileA after truncating :  18432

size of fileA before truncating :  53000
'hhiiiiiiiiii00018400\r\n'   g.tell()== 18432
size of fileA after truncating :  18432

size of fileA before truncating :  53000
'hhiiiiiiiiii00018500\r\n'   g.tell()== 28672
size of fileA after truncating :  28672

顺便说一句, truncate()不会关闭文件:

print '\n=================== 3 ====================\n'

from os.path import getsize

# length of ecr is 90 :
ecr = 10*'a' + 10*'b' + 10*'c' + 10*'d' + 10*'e' +\
      10*'f' + 10*'g' + 10*'h' + 10*'i'


# creation of a file whose length exceeds the reading buffer's size
with open('fileA.txt','wb') as f:
    for i in xrange(100,53001,100): # 530 turns of iteration
        f.write(ecr + str(i).zfill(8) + '\r\n')
        # length of each written line is 100
        # file's length will be 53000


print 'size of fileA before truncating : ',getsize('fileA.txt')
with open('fileA.txt','a+b') as g:
    for line in g:
        if '00000200' in line:
            print repr(line[78:]),'  g.tell()==',g.tell()
            # at this point, 800 characters should have been read
            # if there wasn't a buffer
            g.truncate()
    g.seek(6000,0)
    k = 0
    for li in g:
        k+=1
        print 'k==',k,'   ',repr(li[-32:])
        if k==7:
            break
print 'size of fileA after truncating : ',getsize('fileA.txt')

结果

=================== 3 ====================

size of fileA before truncating :  53000
'hhiiiiiiiiii00000200\r\n'   g.tell()== 8192
k== 1     'gghhhhhhhhhhiiiiiiiiii00006100\r\n'
k== 2     'gghhhhhhhhhhiiiiiiiiii00006200\r\n'
k== 3     'gghhhhhhhhhhiiiiiiiiii00006300\r\n'
k== 4     'gghhhhhhhhhhiiiiiiiiii00006400\r\n'
k== 5     'gghhhhhhhhhhiiiiiiiiii00006500\r\n'
k== 6     'gghhhhhhhhhhiiiiiiiiii00006600\r\n'
k== 7     'gghhhhhhhhhhiiiiiiiiii00006700\r\n'
size of fileA after truncating :  8192

但如果在 truncate()之后放置了写入指令,则程序的行为就会变得不连贯。试试吧。