找出一组字符串的相关程度

时间:2012-03-23 21:49:55

标签: php javascript algorithm

我有一个字符串数组(如示例所示)。我只是想知道其中最常见的是什么。最常见的字符串定义为: - 如果Apple Ipod touch出现10次(比如说),而apple ipod出现8次,那么我会说Apple Ipod Touch在所有元素中都是显性/普通字符串。

Apple iPod touch, 8GB (with FaceTime Camera and Retina Display)
Aple Ipod Clasic 80gb 6th Generation Black
iPod classic 160GB - Silver
Apple 8GB iPod Touch
Apple Ipod Touch 8gb 4th Generation Mc540ll/a 8 Gb Newest Model
Apple iPod touch Black 4th Generation 8GB Touch Screen Wi-Fi MP3
Apple 8GB iPod touch�
Apple 8GB iPod touch MC540LL/A
Apple MC540LL/A - 8GB iPod Touch w/ Camera (4th Gen) (Newest Model)
Apple iPod Touch - 8 GB - Electronics
Apple iPod 8GB 4th Generation Black Touch
Apple iPod touch 8GB 4th Gen (Refurbished)
Apple Ipod Touch Digital Player - Apple Ios 5
Apple Ipod Touch 8G - White (4Th Gen)
Apple MC540LL/A iPod Touch 8GB (4th Generation)
(refurbished) Apple Ipod Touch 8gb (4th Generation)
Apple Ipod Touch 8Gb 4Th Generation
iPod Touch 8GB (4th Gen)
Apple Ipod Touch 32G - White (4Th Gen)
Apple iPod touch 8GB (4th Gen), White
Apple iPod touch White 4th Generation 8GB Touch Screen Wi-Fi MP3
Apple 32GB Black 4th Generation iPod Touch - MC544LL/A
Apple 8GB iPod touch�
Apple iPod touch 8GB - White - Electronics
Apple MC544LL/A - 32GB iPod Touch w/ Camera (4th Gen) (Newest Model)

那么,任何人都可以建议我做一些好的算法吗?问题是我没有任何标准/基准来与之比较。我只需要比较它们之间的所有元素,找出最常见的元素。这必须用PHP或Javascript实现。

希望我的问题清楚。如果我某处不清楚,请发表评论。

1 个答案:

答案 0 :(得分:2)

我不确定你是否看过使用PHP的similar_text函数或是否有类似的javascript函数。 Google快速搜索还向我展示了http://cambiatablog.wordpress.com/2011/03/25/algorithm-for-string-similarity-better-than-levenshtein-and-similar_text/

编辑:similar_text Javascript函数! http://phpjs.org/functions/similar_text:902