Question

我对NLTK中的Python concordance命令有疑问。首先，我来了一个简单的例子：

from nltk.book import *

text1.concordance("monstrous")

哪个工作得很好。现在，我有自己的.txt文件，我想执行相同的命令。我有一个名为“textList”的列表，想要找到“CNA”这个词，所以我把命令

textList.concordance('CNA')

然而，我收到了错误

AttributeError: 'list' object has no attribute 'concordance'.

在示例中，text1不是列表吗？我想知道这里发生了什么。

Answer 1

.concordance()是一个特殊的nltk函数。因此，您无法在任何python对象（如列表）上调用它。

更具体地说：.concordance()是Text class of nltk

中的一种方法

基本上，如果你想使用.concordance()，你必须首先实例化一个Text对象，然后在该对象上调用它。

Text

文本通常是从给定的文档或语料库初始化的。 E.g：
import nltk.corpus  
from nltk.text import Text  
moby = Text(nltk.corpus.gutenberg.words('melville-moby_dick.txt'))

.concordance()

一致性（单词，宽度= 79，行= 25）

使用指定的上下文窗口打印单词的一致性。字匹配不区分大小写。

所以我想这样的事情会起作用（未经测试）

import nltk.corpus  
from nltk.text import Text  
textList = Text(nltk.corpus.gutenberg.words('YOUR FILE NAME HERE.txt'))
textList.concordance('CNA')

Answer 2

我用这段代码搞定了：

import sys
from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.text import Text

def main():
    if not sys.argv[1]:
        return
    # read text
    text = open(sys.argv[1], "r").read()
    tokens = word_tokenize(text)
    textList = Text(tokens)
    textList.concordance('is')
    print(tokens)



if __name__ == '__main__':
    main()

基于this site

Answer 3

在Jupyter笔记本（或Google Colab笔记本）中，完整过程为： MS Word文件->文本文件-> NLTK对象：

from nltk.tokenize import sent_tokenize, word_tokenize
from nltk.text import Text

import docx2txt

myTextFile = docx2txt.process("/mypath/myWordFile")
tokens = word_tokenize(myTextFile)
print(tokens)
textList = Text(tokens)
textList.concordance('contract')

NLTK中的Python concordance命令

3 个答案: