int queryVector = 1;
double similarity = 0.0;
int wordPower;
String[][] arrays = new String[filename][2];
int row;
int col;
for (a = 0; a < filename; a++) {
int totalwordPower = 0;
int totalWords = 0;
try {
System.out
.println(" _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ ");
System.out.println("\n");
System.out.println("The word inputted : " + word2);
File file = new File(
"C:\\Users\\user\\fypworkspace\\TextRenderer\\abc" + a
+ ".txt");
System.out.println(" _________________");
System.out.print("| File = abc" + a + ".txt | \t\t \n");
for (int i = 0; i < array2.length; i++) {
totalCount = 0;
wordCount = 0;
Scanner s = new Scanner(file);
{
while (s.hasNext()) {
totalCount++;
if (s.next().equals(array2[i]))
wordCount++;
}
System.out.print(array2[i] + " --> Word count = "
+ "\t " + "|" + wordCount + "|");
System.out.print(" Total count = " + "\t " + "|"
+ totalCount + "|");
System.out.printf(" Term Frequency = | %8.4f |",
(double) wordCount / totalCount);
System.out.println("\t ");
double inverseTF = Math.log10((float) numDoc
/ (numofDoc[i]));
System.out.println(" --> IDF = " + inverseTF);
double TFIDF = (((double) wordCount / totalCount) * inverseTF);
System.out.println(" --> TF/IDF = " + TFIDF + "\n");
totalWords += wordCount;
wordPower = (int) Math.pow(wordCount, 2);
totalwordPower += wordPower;
System.out.println("Document Vector : " + wordPower);
similarity = (totalWords * queryVector)
/ ((Math.sqrt((totalwordPower)) * (Math
.sqrt(((queryVector * 3))))));
}
}
} catch (FileNotFoundException e) {
System.out.println("File is not found");
}
System.out.println("The total query frequency for this file is "
+ totalWords);
System.out.println("The total document vector : " + totalwordPower);
System.out.println("The similarity is " + similarity);
}
}
}
您好我想根据上面的代码对SIMILARITY SCORE进行排序。这是2个文本文件的示例输出。我一共有10个文本文件。
输入的单词:你好吗
| File = abc0.txt |
怎么 - &gt;字数= | 0 |总计数= | 1289 |期限频率= | 0.0000 |
- &GT; IDF = 1.0413926851582251
- &GT; TF / IDF = 0.0
文件向量:0
是 - &gt;字数= | 0 |总计数= | 1289 |期限频率= | 0.0000 |
- &GT; IDF = 0.43933269383026263
- &GT; TF / IDF = 0.0
文件向量:0
你 - &gt;字数= | 0 |总计数= | 1289 |期限频率= | 0.0000 |
- &GT; IDF = 0.1962946357308887
- &GT; TF / IDF = 0.0
文件向量:0 此文件的总查询频率为0 总文档向量:0 相似性是NaN
输入的单词:你好吗
| File = abc1.txt |
怎么 - &gt;字数= | 0 |总计数= | 426 |期限频率= | 0.0000 |
- &GT; IDF = 1.0413926851582251
- &GT; TF / IDF = 0.0
文件向量:0
是 - &gt;字数= | 0 |总计数= | 426 |期限频率= | 0.0000 |
- &GT; IDF = 0.43933269383026263
- &GT; TF / IDF = 0.0
文件向量:0
你 - &gt;字数= | 3 |总计数= | 426 |期限频率= | 0.0070 |
- &GT; IDF = 0.1962946357308887
- &GT; TF / IDF = 0.0013823565896541458
文件向量:9 此文件的总查询频率为3 总文件向量:9 相似度为0.5773502691896257
注意:这是两个文本文件的示例运行。我总共有10个文本文件。
如何从最高到最低对SIMILARITY分数进行排序?有什么建议吗?
答案 0 :(得分:1)
将SIMILARITY分数添加到列表中并使用库方法排序。它按升序排序,你可以从最后读它。
ArrayList<Double> arrayList = new ArrayList<Double>();
Collections.sort(arrayList);
或者您可以声明一个比较器并使用它,如下所示。
ArrayList<Double> arrayList = new ArrayList<Double>();
Comparator<Double> comparator = Collections.reverseOrder();
Collections.sort(arrayList,comparator);
HTH