Question

我如何在列表中找到一个二元组？例如，如果我想找到

bigram = list(nltk.bigrams("New York"))

在单词列表中

words = nltk.corpus.brown.words(fileids=["ca44"])

我试过了，

for t in bigram:
        if t in words:
             *do something*

以及

if bigram in words:
   *do something*

Answer 1

.bigrams()将返回元组生成器。您应该首先将元组转换为字符串。例如：

bigram_strings = [''.join(t) for t in bigram]

然后你可以做

for t in bigram_strings:
    if t in words:
         *do something*

Answer 2

你可以编写一个为你的单词列表生成bigrams的生成器：

def pairwise(iterable):
    """Iterate over pairs of an iterable."""
    i = iter(iterable)
    j = iter(iterable)
    next(j)
    yield from zip(i, j)

（例如，list(pairwise(["this", "is", "a", "test"]))将返回[('this', 'is'), ('is', 'a'), ('a', 'test')]。）

然后压缩它和.bigrams()的结果：

for pair in pairwise(words):
    for bigram in nltk.bigrams("New York"):
        if bigram == pair:
            pass  # found

在一系列worrds中查找Bigrams

2 个答案: