Question

嗨，我这几天开始玩Python，看起来很容易，所以我在Python中找到了nltk中的语料库。我试用时

text1.concordance("Moby")

它给了我句子的数量和显示包含Moby，cool这个词的句子。

所以我试着测试我是否能找到所有带有Moby和Ahab名字的句子，但遗憾的是我从中得到了错误。

我做错了什么或者我能否得到包含这两个名字的所有句子？我应该使用nltk的另一个功能吗？ O.o

这可能很容易，但对我来说并没有那么多看到它...希望有人可以提供帮助，谢谢。

PS：如果我需要编写一些代码，那么一个例子就会很棒。^^

编辑：由于有人要求输入错误，我也会编写我写的代码。

import nltk
from nltk.book import *

text1.concordance("Moby","Ahab")

给我错误：

Traceback (most recent call last):
  File "<pyshell#2>", line 1, in <module>
    text1.concordance('Moby','Ahab')
  File "C:\Programmering\Python27\lib\site-packages\nltk\text.py", line 314, in concordance
    self._concordance_index.print_concordance(word, width, lines)
  File "C:\Programmering\Python27\lib\site-packages\nltk\text.py", line 174, in print_concordance
    half_width = (width - len(word) - 2) / 2
TypeError: unsupported operand type(s) for -: 'str' and 'int'

我猜到我会得到一些比赛，比如刚跑：

text1.concordance("Moby")

我有84场比赛。

Answer 1

concordance无法做到这一点。它只接受一个单词并打印出结果。没有（合理的）方法将它们作为列表，因此您无法进一步过滤它们。问题是Text，text1背后的对象，仅适用于简单的交互式探索 - 我从未理解为什么nltk书会以它开头。所以忘掉Text，跳过本章的其余部分，直接进入第2章.Moby Dick是gutenberg语料库的一部分，所以你可以迭代它的句子并得到你的答案：< / p>

from nltk.corpus import gutenberg
for s in gutenberg.sents('melville-moby_dick.txt'):
    if 'Ahab' in s and 'Moby' in s:
        print " ".join(s)

Answer 2

您可以列出您想要找到的所有名称，例如：

name_list = ['Moby', 'Ahab']

这样做的代码是：

import nltk
from nltk.book import *
name_list = ['Moby', 'Ahab']
for name in name_list: 
    text1.concordance(name)

在nltk python中从同一个句子中提取两个名字

2 个答案: