Question

我正在研究学习Python The Hard Way，我很好奇为什么我无法检查我复制到的文件的大小。

这是我从书中添加的修改过的脚本。

from sys import argv
from os.path import exists

script, from_file, to_file = argv

print "Copying from %s to %s" % (from_file, to_file)

# We could do these two lines of code in one line, how?
in_file = open(from_file)
in_data = in_file.read()

print "The input file is %d bytes long\n\n" % len(in_data)

print "Does the output file exist? %r \n\n" % exists(to_file)

print "Ready, hit RETURN to continue, CTRL-C to abort.\n"
raw_input()

out_file = open(to_file, 'w')
out_file.write(in_data)
out_file2 = open(to_file) # Added this line
out_data = out_file2.read() # Added this line


print "Completed. The copied file is %r bytes long." % len(out_data)

in_file.close()

由于

Answer 1

“Files”或用于处理文件的内存中的类和函数通常是buffer数据。这意味着在写入磁盘之前，数据被保留是一段临时内存。这允许您对write进行许多小调用，程序将缓冲这些调用，然后在一次大写中将其写入磁盘。这样更快，因为一次向OS发送更大的写入效率更高。（反过来也是如此：文件类通常会缓冲读取，因为一次性提取大量数据比一次不断向操作系统询问1个字节更快。）

但是，如果在允许该缓冲区实际写入之前重新打开该文件，则您将看不到写入，因为它们尚未写入。这是你的问题。这里：

# You open the file:
out_file = open(to_file, 'w')
# You request a write; it gets buffered, but not yet written.
out_file.write(in_data)
# You re-open the file:
out_file2 = open(to_file) # Added this line
# And read. Nothing has been written, so out_data is empty.
out_data = out_file2.read() # Added this line

# Later, when your program shuts down, out_file is closed,
# and as part of closing, writes the data.

如果您想使用两个文件对象，则需要在.close上致电out_file，然后才能从out_file2阅读。然而，Python提供了一种更简单的方法：with语句将为您完成所有这些：

with open(to_file, 'w') as out_file:
    # out_file will close when this with block is done.
    out_file.write(in_data)
# out_file now closed.

with open(to_file) as out_file2: # Added this line
    out_data = out_file2.read() # Added this line
# out_file2 now closed.

请注意，即使对于为阅读而打开的文件，with也是一种很好的做法：每个打开（读取或写入）的文件都会在几乎所有操作系统上使用“文件描述符”（对打开文件的引用），以及这些文件描述符是有限的。

请注意，通过查看open：

，您可以看到it up in the documentation已缓存

open(name[, mode[, buffering]])

[...]

可选的缓冲参数指定文件所需的缓冲区大小：0表示无缓冲，1表示行缓冲，任何其他正值表示使用（大约）该大小的缓冲区（以字节为单位）。负缓冲意味着使用系统默认值，通常为tty设备进行行缓冲，并为其他文件进行完全缓冲。如果省略，则使用系统默认值。

Answer 2

那段代码真的没有意义。您刚刚编写了文件内容，因此这些内容的大小正是您所写内容的长度，即len(in_data)而不是len(out_data)。这样可以节省重新打开文件并再次读取它。

顺便说一下，你得到0大小的原因是因为你必须在阅读文件之前做out_file.close()或至少flush()。否则，使用您的代码，您正在读取，而写入仍然在缓存中，而不是实际上在文件系统上。

复制文件后，无法正确使用len（var_name）来获取复制文件的大小？

2 个答案: