虽然我在record linkage python文档中实现了此代码示例:
import recordlinkage
from recordlinkage.datasets import load_febrl4
dfA, dfB = load_febrl4()
# Indexation step
indexer = recordlinkage.BlockIndex(on='given_name')
pairs = indexer.index(dfA, dfB)
# Comparison step
compare_cl = recordlinkage.Compare()
compare_cl.exact('given_name', 'given_name', label='given_name')
compare_cl.string('surname', 'surname', method='jarowinkler', threshold=0.85, label='surname')
compare_cl.exact('date_of_birth', 'date_of_birth', label='date_of_birth')
compare_cl.exact('suburb', 'suburb', label='suburb')
compare_cl.exact('state', 'state', label='state')
compare_cl.string('address_1', 'address_1', threshold=0.85, label='address_1')
features = compare_cl.compute(pairs, dfA, dfB)
# Classification step
matches = features[features.sum(axis=1) > 3]
print(len(matches))
我遇到以下错误:
Error: ValueError: Duplicated level name: "rec_id", assigned to level 1, is already used for level 0.
答案 0 :(得分:0)
(在Eclipse或Visual Studio代码中) Python记录链接是一种算法;因此,在使用Template design pattern时值得实现。
此外,在安装virtualenv中的要求之前,请确保 python2 升级:
升级python2
curl https://bootstrap.pypa.io/get-pip.py | python
此后
pip install -r requirements.txt
借此, ValueError 消失了。