熊猫800MB CSV导致内存错误(32GB RAM)

时间:2019-11-24 21:21:00

标签: python pandas csv out-of-memory

我有一台32GB的计算机,csv文件为100万行乘4列(800MB)。当我运行代码时,Python仅使用大约1GB的内存,但是出现内存错误:

MemoryError: Unable to allocate array with shape (23459822,) and data type int64

注意:问题仅在Windows上运行,而Ubuntu在完全相同的代码下就消失了

相关代码:

elif light in entry:

    df = pandas.read_csv('maps_android_light_raw_20190909.csv')

    for i,g in df.groupby('device_id'):
        output_file2 = path+f'{i}/LIGHT/'

        if not os.path.exists(output_file2):
            os.makedirs(output_file2)

        g.to_csv(output_file2 + f'{i}.csv', index = False)
        del df

完整的追溯:

Traceback (most recent call last):
  File "light.py", line 49, in <module>
    main()
  File "light.py", line 33, in main
    for i,g in df2:
  File "C:\Python37\lib\site-packages\pandas\core\groupby\ops.py", line 164, in get_iterator
    for key, (i, group) in zip(keys, splitter):
  File "C:\Python37\lib\site-packages\pandas\core\groupby\ops.py", line 899, in __iter__
    sdata = self._get_sorted_data()
  File "C:\Python37\lib\site-packages\pandas\core\groupby\ops.py", line 918, in _get_sorted_data
    return self.data.take(self.sort_idx, axis=self.axis)
  File "pandas/_libs/properties.pyx", line 34, in pandas._libs.properties.CachedProperty.__get__
  File "C:\Python37\lib\site-packages\pandas\core\groupby\ops.py", line 896, in sort_idx
    return get_group_index_sorter(self.labels, self.ngroups)
  File "C:\Python37\lib\site-packages\pandas\core\sorting.py", line 349, in get_group_index_sorter
    sorter, _ = algos.groupsort_indexer(ensure_int64(group_index), ngroups)
  File "pandas/_libs/algos.pyx", line 173, in pandas._libs.algos.groupsort_indexer
MemoryError: Unable to allocate array with shape (23459822,) and data type int64

0 个答案:

没有答案