如果我有一组单词,并且想要在它们之间找到一个模式,然后在长文本中寻找该模式,我应该使用什么,机器学习,文本分析或模式识别?
答案 0 :(得分:0)
我将为所有单词构建n-gram。
from nltk import ngrams
from collections import Counter
words = ["aim", "aid", "bail", "bait"]
def build_ngrams(words, from_size, to_size):
word_ngrams = []
for word in words:
for ngram_size in range(from_size, to_size + 1):
ng = ngrams(word, ngram_size)
word_ngrams.extend(ng)
return word_ngrams
# construct all bigrams and trigrams
word_ngrams = build_ngrams(words, 2, 3)
# find the most common n-grams
counter = Counter(word_ngrams)
print(counter.most_common(3))
这将为您提供最常见的模式,以后您可以将其用于搜索。