Question

目前我正在使用基于WordNet的语义相似度测量项目。我知道下面是计算两个句子之间语义相似性的步骤：

每个句子都被划分为一个令牌列表。
词干。
词性消歧（或标记）。
找出句子中每个单词的最恰当的意义（Word Sense Disambiguation）。
根据单词对的相似度计算句子的相似度。

现在我在第3步。但是我无法获得正确的输出。我对Python不是很熟悉。所以我很感激你的帮助。

这是我的代码。

＆＃13;

import nltk
from nltk.corpus import stopwords


def get_tokens():

    test_sentence = open("D:/test/resources/AnswerEvaluation/Sample.txt", "r")

    try:
        for item in test_sentence:
            stop_words = set(stopwords.words('english'))

            token_words = nltk.word_tokenize(item)

            sentence_tokenization = [word for word in token_words if word not in stop_words]
            print (sentence_tokenization)
            return sentence_tokenization

    except Exception as e:
        print (str(e))


def get_stems():

    tokenized_sentence = get_tokens()

    for tokens in tokenized_sentence:
        sentence_stemming = nltk.PorterStemmer().stem(tokens)
        print (sentence_stemming)
        return sentence_stemming


def get_tags():

    stemmed_sentence = get_stems()

    tag_words = nltk.pos_tag(stemmed_sentence)

    print (tag_words)
    return tag_words

get_tags()

＆＃13;

Sample.txt包含句子，我正乘坐汽车。我正坐在车里。

基于Wordnet的语义相似度测量

0 个答案: