我正在一个包含二元组的项目,但是我不知道如何删除重复的二元组。
import nltk
from nltk.tokenize import sent_tokenize, word_tokenize
file_content = open("corpus.txt").read()
Tokens = nltk.word_tokenize(file_content)
nltk_tokens = nltk.word_tokenize(file_content)
ordered_tokens = set()
result = []
for word in file_content:
if word not in ordered_tokens:
ordered_tokens.add(word)
result.append(word)
print(result)
输出:
[('the', 'first'): 3, ('first', 'Secretary'): 3, ('Secretary', 'the'): 1,]
我需要删除/隐藏重复的二元组。最终结果应该是
[('the', 'first'): 1, ('first', 'Secretary'): 1, ('Secretary', 'the'): 1,]