Question

我试图从一个句子中获取三字母并将它们保存在字典中，并以它们的频率作为值。我写了这个：

trigrams = {}
sentence = ["What", "is", "happening", "right", "now"]

for word in sentence:
      if word != sentence[-1] or sentence[-2] and tuple((word, sentence[sentence.index(word) +1], sentence[sentence.index(word) +2])) not in trigrams:
             trigrams.update({tuple((word, sentence[sentence.index(word) +1], sentence[sentence.index(word) +2])):1})

应该是这样的：（＆＃34;什么＆＃34;＆＃34;是＆＃34;＆＃34;发生＆＃34）：1 （＆＃34;是＆＃34;＆＃34;发生＆＃34;＆＃34;右＆＃34）：1 等

但是现在我一直在更新行中得到一个IndexError。

Answer 1

您可以使用列表，因为元组的内容都是相同的数据类型（字符串）

这可能更容易：

trigrams = []
sentence = ["What", "is", "happening", "right", "now"]

for i in range(2,len(sentence)):
    trigrams.append([sentence[i-2],sentence[i-1],sentence[i]])

Answer 2

我猜if word != sentence[-1] or sentence[-2]不是你想要的。您的意思是if word != sentence[-1] and word != sentence[-2]，意味着word不等于sentence[-1]还是sentence[-2]？

Answer 3

鉴于您希望将代码结构与元组保持一致并最小化代码，您可以这样做（不要说这可能是解决您问题的好方法等）：

trigrams = {}
sentence = ["What", "is", "happening", "right", "now"]

for index, word in enumerate(sentence):
    print index, word  # to understand how the iteration goes on
    if index < len(sentence)-2:
        if tuple((word, sentence[index+1], sentence[index+2])) not in trigrams:
            trigrams.update({tuple((word, sentence[index+1], sentence[index+2])):1})

您收到索引错误，因为您正在访问tuple（）中不存在的元素...因为您正在检查是否接近列表末尾（最后两个元素）的方式）做得不对。

您使用的代码：

if word != sentence[-1] or sentence[-2]

是不对的，你最终比较字符串而不是索引，这是重要的！比较索引，而不是那些位置的值。

为什么我会得到一个IndexError？

3 个答案: