群集时,我收到以下警告
UserWarning: A component contained 77760 elements.
Components larger than 30000 are re-filtered.
The threshold for this filtering is 4.08109134074e-15
这是什么意思?
我原来的阈值规格是0.191,如下所示
clustered_dupes = deduper.match(data,threshold=0.191)