考虑这个用例,我创建一个巨大的文件(为了清楚起见),然后将其读入2个不同的列表。
import csv
import time
TEMP_FILE_NAME = '/tmp/foo.csv'
def write_huge_file():
with open(TEMP_FILE_NAME, 'wb') as f:
writer = csv.writer(f)
writer.writerows((((i, i + 100) for i in xrange(29999999))))
def get_file_iterator():
with open(TEMP_FILE_NAME, 'rb') as f:
reader = csv.reader(f, delimiter=',')
for row in reader:
yield row
def make_2_list_from_object():
file_iterator = get_file_iterator()
main_list = [(i, j) for i, j in file_iterator]
list1 = [i[0] for i in main_list]
list2 = [i[1] for i in main_list]
def make_2_list_from_file():
list1 = list(i[0] for i in get_file_iterator())
list2 = list(i[1] for i in get_file_iterator())
if __name__ == '__main__':
#write_huge_file() # Uncomment this to write the file once
print 'wrote_file'
a = time.time()
make_2_list_from_file()
b = time.time()
print b-a
make_2_list_from_object()
c = time.time()
print 'Time taken using file: ', str(b-a)
print 'Time taken using object: ', str(c-b)
现在,当我运行它时,我得到了这个输出:
Time taken using file: 49.212211132 s
Time taken using object: 1018.5052530766 s
有人可以向我解释一下吗?我在考虑它是因为当RAM耗尽时python的交换内存使用。
另请注意,我在运行此程序时有4 Gigs的RAM。如果有更多RAM可以重现这一点,则可以增加写入文件中的行数。