Question

我编写了一个代码，用于添加两个不同文本文件中的数字。对于2-3 GB的非常大的数据，我得到了MemoryError。所以，我正在使用一些函数编写一个新代码，以避免将整个数据加载到内存中。

此代码打开一个输入文件'd.txt'，读取大数据中某些行后面的数字，如下所示：

SCALAR
ND    3
ST    0
TS    1000
1.0
1.0
1.0
SCALAR
ND    3
ST    0
TS    2000
3.3
3.4
3.5
SCALAR
ND    3
ST    0
TS    3000
1.7
1.8
1.9

并添加了从较小的文本文件'e.txt'读取的数字如下：

SCALAR
ND    3
ST    0
TS    0
10.0
10.0
10.0

结果写在文本文件'output.txt'中，如下所示：

SCALAR
ND    3
ST    0
TS    1000
11.0
11.0
11.0
SCALAR
ND    3
ST    0
TS    2000
13.3
13.4
13.5
SCALAR
ND    3
ST    0
TS    3000
11.7
11.8
11.9

我准备的代码：

def add_list_same(list1, list2):
    """
    list2 has the same size as list1
    """
    c = [a+b for a, b in zip(list1, list2)]
    print(c)
    return c


def list_numbers_after_ts(n, f):
    result = []
    for line in f:
        if line.startswith('TS'):
            for node in range(n):
                result.append(float(next(f)))
    return result


def writing_TS(f1):
    TS = []
    ND = []
    for line1 in f1:
        if line1.startswith('ND'):
            ND = float(line1.split()[-1])
        if line1.startswith('TS'):
            x = float(line1.split()[-1])
            TS.append(x)
    return TS, ND


with open('d.txt') as depth_dat_file, \
     open('e.txt') as elev_file, \
     open('output.txt', 'w') as out:
    m = writing_TS(depth_dat_file)
    print('number of TS', m[1])
    for j in range(0,int(m[1])-1):
        i = m[1]*j
        out.write('SCALAR\nND  {0:2f}\nST   0\nTS  {0:2f}\n'.format(m[1], m[0][j]))
        list1 = list_numbers_after_ts(int(m[1]), depth_dat_file)
        list2 = list_numbers_after_ts(int(m[1]), elev_file)
        Eh = add_list_same(list1, list2)
        out.writelines(["%.2f\n" % item  for item in Eh])

output.txt是这样的：

SCALAR
ND    3.000000
ST    0
TS    3.000000
SCALAR
ND    3.000000
ST    0
TS    3.000000
SCALAR
ND    3.000000
ST    0
TS    3.000000

添加列表不起作用，除了我单独检查功能外，它们还可以工作。我没有找到错误。我改变了很多，但它不起作用。有什么问题吗？非常感谢您提供的任何帮助！

Answer 1

您可以使用grouper按固定的行数读取文件。如果组中的行顺序不变，则下一个代码应该有效。

from itertools import zip_longest

#Split by group iterator
#See http://stackoverflow.com/questions/434287/what-is-the-most-pythonic-way-to-iterate-over-a-list-in-chunks
def grouper(iterable, n, padvalue=None):
  return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)

add_numbers = []

with open("e.txt") as f:
    # Read data by 7 lines
    for lines in grouper(f, 7): 
        # Suppress first SCALAR line
        for line in lines[1:]:
            # add last number in every line to array (6 elements)
            add_numbers.append(float(line.split()[-1].strip())) 

#template for every group
template = 'SCALAR\nND {:.2f}\nST {:.2f}\nTS {:.2f}\n{:.2f}\n{:.2f}\n{:.2f}\n'

with open("d.txt") as f, open('output.txt', 'w') as out:
    # As before
    for lines in grouper(f, 7):
        data_numbers = []
        for line in lines[1:]: 
            data_numbers.append(float(line.split()[-1].strip())) 
        # in result_numbers sum elements of two arrays by pair (6 elements)
        result_numbers = [x + y for x, y in zip(data_numbers, add_numbers)]
        # * unpack result_numbers as 6 arguments of function format
        out.write(template.format(*result_numbers))

Answer 2

我不得不在代码中更改一些小东西，现在它可以工作但只适用于小输入文件，因为许多变量都被加载到内存中。你能告诉我如何处理收益率。

from itertools import zip_longest

def grouper(iterable, n, padvalue=None):
    return zip_longest(*[iter(iterable)]*n, fillvalue=padvalue)


def writing_ND(f1):
    for line1 in f1:
        if line1.startswith('ND'):
            ND = float(line1.split()[-1])
            return ND


def writing_TS(f):
    for line2 in f:
        if line2.startswith('TS'):
            x = float(line2.split()[-1])
            TS.append(x)
    return TS
TS = []
ND = []
x = 0.0
n = 0
add_numbers = []

with open("e.txt") as f, open("d.txt") as f1,\
     open('output.txt', 'w') as out:
    ND = writing_ND(f)
    TS = writing_TS(f1)
    n = int(ND)+4
    f.seek(0)
    for lines in grouper(f, int(n)):
        for item in lines[4:]:
            add_numbers.append(float(item))
    i = 0
    for l in grouper(f1, n):
        data_numbers = []
        for line in l[4:]:
            data_numbers.append(float(line.split()[-1].strip()))
            result_numbers = [x + y for x, y in zip(data_numbers, add_numbers)]
        del data_numbers
        out.write('SCALAR\nND    %d\nST  0\nTS      %0.2f\n' % (ND, TS[i]))
        i += 1
        for item in result_numbers:
            out.write('%s\n' % item)

函数未正确返回列表

2 个答案: