Question

我想了解MemoryError发生的情况，它似乎或多或少地随机发生。我正在Docker下和Azure VM（2CPU和7GB RAM）上运行Python 3程序。

为简单起见，该程序处理由特定库读取的二进制文件（那里没有问题），然后我按文件的对等文件合并它们，最后将数据插入数据库中。
合并后（和数据库插入之前）获得的数据集是一个Pandas数据框，包含约〜280万行和36列。

对于插入数据库，我使用的是REST API，该API必须我按块插入文件。在此之前，我使用以下函数将数据帧转换为StringIO缓冲区：

DataTable

所以在我的“主”程序中，行为是：

# static method from Utils class
@staticmethod
def df_to_buffer(my_df):
    count_row, count_col = my_df.shape
    buffer = io.StringIO()  #creating an empty buffer
    my_df.to_csv(buffer, index=False)  #filling that buffer
    LOGGER.info('Current data contains %d rows and %d columns, for a total 
    buffer size of %d bytes.', count_row, count_col, buffer.tell())
    buffer.seek(0) #set to the start of the stream
    return buffer

问题：
有时我可以连续插入2或3个文件。有时会在随机时刻（但总是在要插入新文件时）发生MemoryError。
错误发生在文件插入的第一次迭代中（从不在文件中间）。具体来说，它在执行块# transform the dataframe to a StringIO buffer file_data = Utils.df_to_buffer(file_df) buffer_chunk_size = 32000000 #32MB while True: data = file_data.read(buffer_chunk_size) if data: ... # do the insert stuff ... else: # whole file has been loaded break # loop is over, close the buffer before processing a new file file_data.close()

的行上崩溃

在此过程中，我正在监视内存（使用file_data.read(buffer_chunk_size)：它的内存从未超过5.5 GB，尤其是当崩溃发生时，该内存在大约3.5 GB的已用内存上运行片刻...

任何信息或建议，我们将不胜感激，谢谢。：）

编辑
我能够进行调试，并能够确定问题所在，但尚未解决。
当我按块读取StringIO缓冲区时会发生这种情况。数据块增加了很多RAM消耗，因为它是一个很大的htop，其中包含320000000个字符的文件。我试图将其从32000000减少到16000000。我可以插入一些文件，但是过了一段时间，MemoryError再次出现...我正尝试将其减少到8000000。

Python“随机” MemoryError

0 个答案: