I have a sentence:
"this is a test string for bigram pair generation"
I need to form bigram pairs in python and store them in a variable. condition: a word can only be allowed to make pair with forward next 3 words.
Here is what I want:
[["this", "is"], ["this", "a"], ["this", "test"], ["is", "a"], ["is", "test"], ["is", "string"], ["a", "test"], ["a", "string"], ["a", "for"], ["test", "string"], ["test", "for"], ["test", "bigram"], ["string", "for"], ["string", "bigram"], ["string", "pair"], ["for", "bigram"], ["for", "pair"], ["for", "generation"], ["bigram", "pair"], ["bigram", "generation"], ["pair", "generation"]]
答案 0 :(得分:1)
使用.split()
方法列出句子中的所有单词,然后循环遍历它,将每个合适的对添加到结果列表中:
sentence = "this is a test string for bigram pair generation"
words = sentence.split()
result = []
for i in range(len(words)):
for j in range(1, 4):
if i + j < len(words):
result.append([words[i], words[i+j]])
print(result)
由于相应的配对词仅向前1
到3
个字词,因此内部j
循环中的for
变量用于确保它。