Question

产品列是否包含彼此非常相似的数据？例如鱼油胶囊鱼油胶囊1x30 鱼油胶囊

所有数据都是相同的，但我想将它们作为“鱼油盖”制作一个，以便更好地进行分析。

大约有400种不同产品的记录。我会使用'jaccard'，'jw'使用Stringdist的距离方法但是非常耗时。

Answer 1

对长度为15的400个随机字符串使用stringdistmatrix函数，用户时间= 0.12秒。

strings<-stringi::stri_rand_strings(400, 15)
system.time(stringdistmatrix(strings, strings,method = "jw"))
   user  system elapsed 
   0.12    0.01    0.05

如何在R中的单个列中查找数据相似性？

1 个答案: