群集组件

时间:2018-04-06 15:15:28

标签: python-dedupe dedupeplugin

群集时,我收到以下警告

UserWarning: A component contained 77760 elements. 
Components larger than 30000 are re-filtered. 
The threshold for this filtering is 4.08109134074e-15

这是什么意思?

我原来的阈值规格是0.191,如下所示

clustered_dupes = deduper.match(data,threshold=0.191)

1 个答案:

答案 0 :(得分:0)

阈值用于cophenetic similarity of a cluster而不是成对相似性。