如何使用nltk删除重复的二元组

时间:2019-06-28 19:58:56

标签: python nltk

我正在一个包含二元组的项目,但是我不知道如何删除重复的二元组。

import nltk
from nltk.tokenize import sent_tokenize, word_tokenize

file_content = open("corpus.txt").read()

Tokens = nltk.word_tokenize(file_content)

nltk_tokens = nltk.word_tokenize(file_content)

ordered_tokens = set()
result = []
for word in file_content:
    if word not in ordered_tokens:
        ordered_tokens.add(word)
        result.append(word)

print(result)

输出:

[('the', 'first'): 3, ('first', 'Secretary'): 3, ('Secretary', 'the'): 1,]

我需要删除/隐藏重复的二元组。最终结果应该是

[('the', 'first'): 1, ('first', 'Secretary'): 1, ('Secretary', 'the'): 1,]

0 个答案:

没有答案