@profile
def get_duplicates_by_polis(filename):
dbf_file = dbf.Table(filename, codepage="cp866")
dbf_file.open(mode=dbf.READ_ONLY)
dbf_out = dbf.Table(''.join(("polis_", filename)), field_specs=dbf_file.structure(), codepage="cp866")
dbf_out.open()
dict_with_human = {'|'.join((rec.f_dpfs_s.strip(), rec.f_dpfs_n.strip())).lower(): list() for rec in dbf_file if rec.f_dpfs_n}
dbf_size = len(dbf_file)
for rec in dbf_file:
human = '|'.join((rec.f_dpfs_s.strip(), rec.f_dpfs_n.strip())).lower()
dict_with_human[human].append(rec)
if '|' in dict_with_human:
del dict_with_human['|']
for key, val in dict_with_human.items():
if len(val) > 1:
for v in val:
dbf_out.append(v)
dbf_out.close()
这段代码很慢。对于具有70k记录的表,它运行600秒。它工作45分钟,有170k记录。我正在使用cProfile进行分析。它告诉我这个结果:
ncalls tottime percall cumtime percall filename:lineno(function)
151982 531.5004 0.0035 618.2307 0.0041 /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 4366 ( __getitem__ )
151988 20.5445 0.0001 20.5445 0.0001 <method 'read' of '_io.BufferedRandom' objects>
157597 17.3023 0.0001 33.9958 0.0002 /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 4311 ( record_length )
1409630 10.4006 0.0000 10.4006 0.0000 <method 'tobytes' of 'array.array' objects>
153851 10.3413 0.0001 18.1183 0.0001 /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 2366 ( __new__ )
315190 9.7412 0.0000 14.8344 0.0000 /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 3351 ( unpack_short_int )
1335488 7.5174 0.0000 12.8380 0.0000 /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 2501 ( __setattr__ )
932632 6.8409 0.0000 6.8409 0.0000 <built-in method _struct.unpack>
460322 6.4973 0.0000 13.5264 0.0000 /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 3421 ( retrieve_character )
475274 6.0520 0.0000 27.0813 0.0001 /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 2447 ( __getattr__ )
617289 5.6978 0.0000 12.9065 0.0000 /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 4300 ( record_count )
475274 5.4385 0.0000 19.2105 0.0000 /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 2629 ( _retrieve_field_value )
157593 4.9029 0.0000 10.6260 0.0001 /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 4325 ( start )
1 3.9478 3.9478 692.9940 692.9940 /home/vagrant/vwrapperhome/1/main.py : 220 ( get_duplicates_by_polis )
617289 3.9313 0.0000 5.6787 0.0000 /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 3360 ( unpack_long_int )
857094 3.3691 0.0000 12.1412 0.0000 <built-in method builtins.len>
151982 3.2310 0.0000 627.9202 0.0041 /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 4639 ( __getitem__ )
159466 2.8492 0.0000 2.8492 0.0000 <method 'seek' of '_io.BufferedRandom' objects>
151985 2.7708 0.0000 641.3214 0.0042 /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 4066 ( __next__ )
460475 2.6225 0.0000 4.4969 0.0000 /home/vagrant/.virtualenvs/p35/lib/python3.5/encodings/cp866.py : 14 ( decode )
如何更快地制作此代码?谢谢。我真的不想改变lib。