Question

我已经成功地使用了NLTK中的concordance()方法和我通过古腾堡语料库阅读的我自己的文本文件：

    bom = open('sentences-with-emoji.txt')
    from nltk.text import Text
    bom = Text(nltk.corpus.gutenberg.words('/my-own-text-file.txt'))
    bom.concordance('messiah')

我说＆＃34;通过＆＃34;因为concordance()方法只通过指定的语料库（即古腾堡）读取单词。古腾堡语料库中没有表情符号。所以当我尝试包含这样的表情符号的不同文件时：

    bom = open('sentences-with-emoji.txt’)
    from nltk.text import Text
    bom = Text(nltk.corpus.gutenberg.words('/sentences-with-emoji.txt'))
    bom.concordance('')

我收到回复：

No matches

我是否必须使用我的/sentences-with-emoji.txt文件创建带注释的语料库（使用此处的过程：Creating a new corpus with NLTK），以便将concordance()方法与表情符号一起使用？

Answer 1

nltk.text要求您传递令牌列表。此外，您不必创建新的语料库或通过gutenberg.words进行额外的往返。加载和标记化原始文本文件就足够了。

# raw = open('sentences-with-emoji.txt').read()
raw = 'word  word'
tokens = nltk.word_tokenize(raw)

text = Text(tokens)
text.concordance('')

Displaying 1 of 1 matches:
                                  word  word

是否可以使用表情符号的NLTK一致性功能？

1 个答案: