让我们假设我在Lucene索引中有一个文档列表,我已经配置了关键字/短语。
Title: FaceBook
Content: Associated list of rules, notification, facebook.
此处标题是字段,内容是字段。 现在我的输入是
这是来自Facebook的通知消息
因此以下消息的结果应为
没有点击:2
匹配百分比:100%(导致两个已配置的关键字完全匹配)
现在我的另一个输入是,
通知错误配置
所以这里配置了12个字符的通知。并且在消息中通知只有6个字符匹配所以(6/12 * 100)应该是50%。
所以我想要这样的输出 部分匹配发生 百分比匹配为50%
答案 0 :(得分:3)
不知何故,我对百分比匹配的解决方案就是这个
public class lab1 {
public static double similarity(String s1, String s2) {
String longer = s1, shorter = s2;
if (s1.length() < s2.length()) { // longer should always have greater length
longer = s2; shorter = s1;
}
int longerLength = longer.length();
if (longerLength == 0) { return 1.0; /* both strings are zero length */ }
/* // If you have Apache Commons Text
// you can use it to calculate the edit distance:
LevenshteinDistance levenshteinDistance = new LevenshteinDistance();
return (longerLength - levenshteinDistance.apply(longer, shorter)) / (double) longerLength; */
return (longerLength - editDistance(longer, shorter)) / (double) longerLength;
}
public static int editDistance(String s1, String s2) {
s1 = s1.toLowerCase();
s2 = s2.toLowerCase();
int[] costs = new int[s2.length() + 1];
for (int i = 0; i <= s1.length(); i++) {
int lastValue = i;
for (int j = 0; j <= s2.length(); j++) {
if (i == 0)
costs[j] = j;
else {
if (j > 0) {
int newValue = costs[j - 1];
if (s1.charAt(i - 1) != s2.charAt(j - 1))
newValue = Math.min(Math.min(newValue, lastValue),
costs[j]) + 1;
costs[j - 1] = lastValue;
lastValue = newValue;
}
}
}
if (i > 0)
costs[s2.length()] = lastValue;
}
return costs[s2.length()];
}
public static void printSimilarity(String s, String t) {
System.out.println(String.format(
"%.3f Percent is the similarity between \"%s\" and \"%s\"", similarity(s, t)*100, s, t));
}
public static void main(String[] args) {
printSimilarity("", "");
printSimilarity("1234567890", "1");
printSimilarity("1234567890", "123");
printSimilarity("1234567890", "1234567");
printSimilarity("1234567890", "1234567890");
printSimilarity("1234567890", "1234567980");
printSimilarity("47/2010", "472010");
printSimilarity("47/2010", "472011");
printSimilarity("47/2010", "AB.CDEF");
printSimilarity("47/2010", "4B.CDEFG");
printSimilarity("47/2010", "AB.CDEFG");
printSimilarity("The quick fox jumped", "The jumped fox");
printSimilarity("The quick fox jumped", "The fox");
printSimilarity("kitten", "sitting");
}
}