Bigram of words in list

时间:2017-11-08 21:57:19

标签: python

I have a sentence:

 "this is a test string for bigram pair generation"

I need to form bigram pairs in python and store them in a variable. condition: a word can only be allowed to make pair with forward next 3 words.

Here is what I want:

 [["this", "is"], ["this", "a"], ["this", "test"], ["is", "a"], ["is", "test"], ["is", "string"], ["a", "test"], ["a", "string"], ["a", "for"], ["test", "string"], ["test", "for"], ["test", "bigram"], ["string", "for"], ["string", "bigram"], ["string", "pair"], ["for", "bigram"], ["for", "pair"], ["for", "generation"], ["bigram", "pair"], ["bigram", "generation"], ["pair", "generation"]]

1 个答案:

答案 0 :(得分:1)

使用.split()方法列出句子中的所有单词,然后循环遍历它,将每个合适的对添加到结果列表中:

sentence = "this is a test string for bigram pair generation"

words = sentence.split()
result = []

for i in range(len(words)):
    for j in range(1, 4):
        if i + j < len(words):
            result.append([words[i], words[i+j]])

print(result)

由于相应的配对词仅向前13个字词,因此内部j循环中的for变量用于确保它。