Question

我正在尝试将xOr操作应用于许多文件，其中一些文件非常大。
基本上我得到一个文件并逐字节地对它进行排序（或者至少这是我认为我正在做的事情）。当它遇到一个更大的文件（大约70MB）时，我得到一个内存不足错误，我的脚本崩溃了。
我的电脑有16GB的Ram，有超过50％的可用，所以我不会把它与我的硬件联系起来。

def xor3(source_file, target_file):
    b = bytearray(open(source_file, 'rb').read())
    for i in range(len(b)):
        b[i] ^= 0x71
    open(target_file, 'wb').write(b)

我试图以块的形式读取文件，但似乎我对此并不熟悉，因为输出不是所需的。第一个函数返回我想要的东西，当然:)

def xor(data):
    b = bytearray(data)
    for i in range(len(b)):
        b[i] ^= 0x41
    return data


def xor4(source_file, target_file):
    with open(source_file,'rb') as ifile:
        with open(target_file, 'w+b') as ofile:
            data = ifile.read(1024*1024)
            while data:
                ofile.write(xor(data))
                data = ifile.read(1024*1024)

这种操作的合适解决方案是什么？我做错了什么？

Answer 1

使用var totals = [ // Tot M T W T F S S [0, 0, 0, 0, 0, 0, 0, 0], // Totals [0, 0, 0, 0, 0, 0, 0, 0], // Morning [0, 0, 0, 0, 0, 0, 0, 0], // Afternoon [0, 0, 0, 0, 0, 0, 0, 0], // Evening [0, 0, 0, 0, 0, 0, 0, 0] // Night ]; var collectionOfData = entiredataset; collectionOfData.forEach(function (item) { var localDate = getLocalDate(item);//gets users local date and determines timeslot id - ie Morning,afternoon, evening, or night var dayOfWeek = localDate.day(); var timeslotId = item.timeslotId; totals[timeslotId][dayOfWeek]++; // Increase sessions per slot and day totals[0][dayOfWeek]++; // Increase total sessions per day totals[timeslotId][0]++; // Increase total sessions per slot totals[0][0]++; // Increase total sessions }函数以块的形式获取文件并每次附加它以输出文件

seek

Answer 2

懒洋洋地在大文件上迭代。

from operator import xor
from functools import partial
def chunked(file, chunk_size):
    return iter(lambda: file.read(chunk_size), b'')
myoperation = partial(xor, 0x71)

with open(source_file, 'rb') as source, open(target_file, 'ab') as target:
    processed = (map(myoperation, bytearray(data)) for data in chunked(source, 65536))
    for data in processed:
        target.write(bytearray(data))

Answer 3

除非我弄错了，否则在第二个示例中，您可以通过调用data并将其分配给bytearray来创建b的副本。然后修改b，但返回data。 b中的修改对data本身没有影响。

Answer 4

这可能仅适用于python 2，它再次表明它可以更好地用于字节流：

def xor(infile, outfile, val=0x71, chunk=1024):
    with open(infile, 'r') as inf:
        with open(outfile, 'w') as outf:
            c = inf.read(chunk)
            while c != '':
                s = "".join([chr(ord(cc) ^val) for cc in c])
                outf.write(s)
                c = inf.read(chunk)

在python中xor一个大文件

4 个答案: