Question

对于家庭作业，我们将把basicCompare方法转换为比较两个文本文档并查看它们是否属于类似主题的内容。基本上，程序将删除所有长度不超过五个字符的单词，并留下列表。我们应该对列表进行比较，并且如果在两个文档之间使用了足够多的单词（假设80％的相似度），则该方法返回true并表示“匹配”。

但是，我对所有评论都在方法底部的位置感到困惑。我无法想到或找到比较两个列表的方法，并找出两个列表中单词的百分比。也许我在考虑错误，并且需要过滤掉两个列表中都没有的单词，然后只计算剩下多少单词。用于定义输入文档是否匹配的参数完全取决于我们，因此可以根据需要设置这些参数。如果你有点女士们，先生们可以指出我正确的方向，即使是某个功能的Java文档页面，我相信我可以完成其余的工作。我只需要知道从哪里开始。

import java.util.Collections;
import java.util.List;

public class MyComparator implements DocumentComparator {

        public static void main(String args[]){
                MyComparator mc = new MyComparator();

if(mc.basicCompare("C:\\Users\\Quinncuatro\\Desktop\\MatchLabJava\\LabCode\\match1.txt", "C:\\Users\\Quinncuatro\\Desktop\\MatchLabJava\\LabCode\\match2.txt")){
                    System.out.println("match1.txt and match2.txt are similar!");
            } else {
                    System.out.println("match1.txt and match2.txt are NOT similar!");
            }
    }

    //In the basicCompare method, since the bottom returns false, it results in the else statement in the calling above, saying they're not similar
    //Need to implement a thing that if so many of the words are shared, it returns as true

    public boolean basicCompare(String f1, String f2) {
            List<String> wordsFromFirstArticle = LabUtils.getWordsFromFile(f1);
            List<String> wordsFromSecondArticle = LabUtils.getWordsFromFile(f2);

            Collections.sort(wordsFromFirstArticle);
            Collections.sort(wordsFromSecondArticle);//sort list alphabetically

            for(String word : wordsFromFirstArticle){
                    System.out.println(word);
            }

            for(String word2 : wordsFromSecondArticle){
                    System.out.println(word2);
            }

            //Find a way to use common_words to strip out the "noise" in the two lists, so you're ONLY left with unique words
            //Get rid of words not in both lists, if above a certain number, return true
            //If word1 = word2 more than 80%, return true

            //Then just write more whatever.basicCompare modules to compare 2 to 3, 1 to 3, 1 to no, 2 to no, and 3 to no

            //Once you get it working, you don't need to print the words, just say whether or not they "match"

            return false;

    }


    public boolean mapCompare(String f1, String f2) {

            return false;
    }

}

Answer 1

尝试通过在纸上或在脑海中执行步骤来提出算法。一旦了解了您需要做的事情，请将其转换为代码。这就是所有算法的发明方式。

Answer 2

首先将List更改为Set以删除重复项。

迭代其中一个集合并使用contains方法检查另一个是否包含相同的单词。

int count = 0;
Set<String> set1 = new HashSet<String>(LabUtils.getWordsFromFile(f1));
Set<String> set2 = new HashSet<String>(LabUtils.getWordsFromFile(f2));

Iterator<String> it = set1.iterator();

while (it.hasNext()){
    String s = it.next();

    if (set2.contains(s)){
        count++;
    }

}

然后使用计数器计算百分比（计数/总计）* 100.如果大于80％则返回true，否则返回false。

了解列表，集合和队列之间的区别总是很好。我希望这能指出你正确的方向。

如何在Java中找到两个类似的列表？

2 个答案: