应用错误收集

我有一个很大的csv文件，我想逐块加载到Dataframe中。在处理数据框之前，我想检查是否有任何感兴趣的ID已加载到新块中。由于csv是按ID排序的，所以我的意思是继续加载批次，直到遇到感兴趣的ID。这是执行此操作的代码行：

ids_to_check_for_in_measurement = a list of ids
next_batch = csv_reader.get_chunk(1000)
while any(True for id in ids_to_check_for_in_measurement if id in next_batch.id.unique()) == False:
                    next_batch = csv_reader.get_chunk(1000)

这似乎可以达到一定程度，然后出现以下错误：在pandas._libs.parsers.TextReader._read_low_memory

中，文件“ pandas / _libs / parsers.pyx”，第921行

我认为，如果我调用get_chunk（）并覆盖了变量，那么垃圾收集器将处理前一个块，并且我可以遍历csv的行而不会达到内存的限制。我在这里想念东西吗？

遍历大型csv文件时，pandas read_csv get_chunk（）方法引发内存错误

0 个答案: