Question

I am working on a project in Java in which I want to find the similarity between two words. The result of similarity should be between 0 and 1. I have seen many APIs like JWI, WSJ4 etc. The most appropriate that looks to solve my problem is WSJ4. I have written its code and gives the answer too. But it does not deal with forms of verbs and plurals etc. For example if I want to compare 'admission' and 'admissions', it does not find admissions with it. Then I added the porter stemmer which is provided at 'https://ws4j.googlecode.com/svn-history/r3/trunk/edu.cmu.lti.ws4j/src/main/java/edu/cmu/lti/ws4j/util/PorterStemmer.java'

This stems the words to their root. and does not find them with it. For example it stems 'admission' and 'admissions' to 'admiss' and return '-1' in result.

Here is the code:

private static ILexicalDatabase db = new NictWordNet();

private static void findSimilarityOfWords(String word1, String word2){

    WS4JConfiguration.getInstance().setMFS(true);
    RelatednessCalculator rc = new Lin(db);

    List<POS[]> posPairs = rc.getPOSPairs();
    double maxScore = -1D;

    for(POS[] posPair: posPairs) {
        List<Concept> synsets1 = (List<Concept>)db.getAllConcepts(word1, posPair[0].toString());
        List<Concept> synsets2 = (List<Concept>)db.getAllConcepts(word2, posPair[1].toString());

        for(Concept synset1: synsets1) {
            for (Concept synset2: synsets2) {
                Relatedness relatedness = rc.calcRelatednessOfSynset(synset1, synset2);
                double score = relatedness.getScore();
                if (score > maxScore) { 
                    maxScore = score;
                }
            }
        }
    }

    if (maxScore == -1D) {
        maxScore = 0.0;
    }

    System.out.println("sim('" + word1 + "', '" + word2 + "') =  " + maxScore);
}

public static void main(String[] args) {
    PorterStemmer stemmer = new PorterStemmer();
    String[] stemmedWords;
    String word1 = "admission";
    String word2 = "admissions";
    String stemmedWord1;
    String stemmedWord2;
    stemmedWord1 = stemmer.stemWord(word1);
    stemmedWord2 = stemmer.stemWord(word2);
    System.out.println("Words before stemming: " + word1 + ", "+word2);
    System.out.println("Words after stemming: " + stemmedWord1 + ", "+stemmedWord2);
    findSimilarityOfWords(stemmedWord1, stemmedWord2);
}

output:

Words before stemming: admission, admissions Words after stemming: admiss, admiss sim('admiss', 'admiss') = 0.0

Finding Similarity between two words

0 个答案: