如何加快dbf的读取速度

时间:2015-10-09 13:48:07

标签: python dbf

Goog日。我有一个简单的代码来读取dbf。它按特定列分组记录,并将多个记录写入其他文件。

@profile
def get_duplicates_by_polis(filename):
    dbf_file = dbf.Table(filename, codepage="cp866")
    dbf_file.open(mode=dbf.READ_ONLY)
    dbf_out = dbf.Table(''.join(("polis_", filename)), field_specs=dbf_file.structure(), codepage="cp866")
    dbf_out.open()

    dict_with_human = {'|'.join((rec.f_dpfs_s.strip(), rec.f_dpfs_n.strip())).lower(): list() for rec in dbf_file if rec.f_dpfs_n}
    dbf_size = len(dbf_file)

    for rec in dbf_file:
        human = '|'.join((rec.f_dpfs_s.strip(), rec.f_dpfs_n.strip())).lower()
        dict_with_human[human].append(rec)

    if '|' in dict_with_human:
        del dict_with_human['|']

    for key, val in dict_with_human.items():
        if len(val) > 1:
            for v in val:
                dbf_out.append(v)

    dbf_out.close()

这段代码很慢。对于具有70k记录的表,它运行600秒。它工作45分钟,有170k记录。我正在使用cProfile进行分析。它告诉我这个结果:

ncalls   tottime     percall   cumtime    percall     filename:lineno(function)
151982   531.5004   0.0035     618.2307    0.0041     /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 4366 ( __getitem__ )
151988   20.5445    0.0001     20.5445     0.0001      <method 'read' of '_io.BufferedRandom' objects>
157597   17.3023    0.0001     33.9958     0.0002      /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 4311 ( record_length )
1409630  10.4006    0.0000     10.4006     0.0000      <method 'tobytes' of 'array.array' objects>
153851   10.3413    0.0001     18.1183     0.0001      /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 2366 ( __new__ )
315190   9.7412     0.0000     14.8344     0.0000      /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 3351 ( unpack_short_int )
1335488  7.5174     0.0000     12.8380     0.0000      /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 2501 ( __setattr__ )
932632   6.8409     0.0000     6.8409      0.0000       <built-in method _struct.unpack>
460322   6.4973     0.0000     13.5264     0.0000      /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 3421 ( retrieve_character )
475274   6.0520     0.0000     27.0813     0.0001      /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 2447 ( __getattr__ )
617289   5.6978     0.0000     12.9065     0.0000      /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 4300 ( record_count )
475274   5.4385     0.0000     19.2105     0.0000      /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 2629 ( _retrieve_field_value )
157593   4.9029     0.0000     10.6260     0.0001      /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 4325 ( start )
1        3.9478     3.9478     692.9940    692.9940   /home/vagrant/vwrapperhome/1/main.py : 220 ( get_duplicates_by_polis )
617289   3.9313     0.0000     5.6787      0.0000       /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 3360 ( unpack_long_int )
857094   3.3691     0.0000     12.1412     0.0000      <built-in method builtins.len>
151982   3.2310     0.0000     627.9202    0.0041     /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 4639 ( __getitem__ )
159466   2.8492     0.0000     2.8492      0.0000       <method 'seek' of '_io.BufferedRandom' objects>
151985   2.7708     0.0000     641.3214    0.0042     /home/vagrant/.virtualenvs/p35/lib/python3.5/site-packages/dbf/ver_33.py : 4066 ( __next__ )
460475   2.6225     0.0000     4.4969      0.0000       /home/vagrant/.virtualenvs/p35/lib/python3.5/encodings/cp866.py : 14 ( decode )

如何更快地制作此代码?谢谢。我真的不想改变lib。

0 个答案:

没有答案