Question

我想创建一个表格，显示3个文本中某些单词的频率，而文本是列，单词是行。

在表格中，我希望看到哪个单词出现在哪个文本中。

这些是我的文字和文字：

texts = [text1, text2, text3]
words = ['blood', 'young', 'mercy', 'woman', 'man', 'fear', 'night', 'happiness', 'heart', 'horse']

为了创建条件频率分布，我想创建一个看起来像很多的元组列表= [（＆＃39; text1＆＃39;，＆＃39; blood＆＃39;），（＆＃39 ; text1＆＃39;，＆＃39; young＆＃39;），...（＆＃39; text2＆＃39;，＆＃39; blood＆＃39;），...）

我试图像这样创造很多：

lot = [(words, texte)
    for word in words
    for text in texts]

而不是批次=（＆＃39; text1＆＃39;，＆＃39; blood＆＃39;）等而不是＆＃39; text1＆＃39;是列表中的全文。

如何创建条件频率分布函数的元组列表？

Answer 1

希望我能正确理解你的问题。我认为你将变量'word'和'texts'分配给他们自己的元组。

尝试以下方法：

texts = [text1, text2, text3]
words = ['blood', 'young', 'mercy', 'woman', 'man', 'fear', 'night', 'happiness', 'heart', 'horse']
lot = [(word, text)
for word in words
for text in texts]

编辑：因为变化是如此微妙，我应该详细说明一下。在原始代码中，您将“单词”和“文本”都设置为它们自己的元组，即您分配整个数组而不是数组的每个元素。

Answer 2

我认为这种嵌套列表理解可能是你想要做的事情吗？

lot = [(word, 'text'+str(i))
    for i,text in enumerate(texts)
    for word in text.split()
    if word in words]

但是，您可能需要考虑使用Counter代替：

from collections import Counter
counts = {}
for i, text in enumerate(texts):
   C = Counter(text.split())
   for word in words:
      if word in C:
         counts[word]['text'+str(i)] = C[word]
      else: 
         counts[word]['text'+str(i)] = 0

为条件频率分布创建一个令牌和文本元组

2 个答案: