我使用python将一些数据提取到csv文件,数据超过100万条记录。当然,我的脚本似乎存在内存问题,因为经过5个小时的努力和大约超过190k的记录编写后,脚本运行过程就会被杀死。
这是我的终端
(.venv)[cv1@mdecv01 maidea]$ python common_scripts/script_tests/ben-test-extract.py BEN
Generating CSV file. Please wait ...
Preparing to write file: BEN-data-20170731.csv
Killed
(.venv)[cv1@mdecv01 maidea]$
他们可以通过适当的内存管理来提取这些数据吗?
here是我的脚本
答案 0 :(得分:1)
您没有利用select_related
或prefetch_related
。如果不使用这两种方法,每次访问相关字段时都会执行数据库调用(ForeignKey,ManyToManyField)
for beneficiary in Beneficiary.objects.all():
if beneficiary.is_active:
household = beneficiary.household
if len(beneficiary.enrolments) > 0 and len(beneficiary.interventions) > 1:
应该是这样的
for beneficiary in Beneficiary.objects.select_related(
'household'
).prefetch_related(
'enrolments',
'interventions'
):
if beneficiary.is_active:
household = beneficiary.household
if len(beneficiary.enrolments.all()) > 0 and len(beneficiary.interventions.all()) > 1:
答案 1 :(得分:0)