Question

我正在尝试从标记的段落中提取所有专有名词。我在我的代码中所做的是，首先我分别提取段落，然后检查其中是否有任何专有名词。但问题是，我无法提取专有名词。我的代码甚至没有进入检查特定标记的循环内。

我的代码：

def noun(sen):
m=[]
if (sen.split('/')[1].lower().startswith('np')&sen.split('/')[1].lower().endswith('np')):
         w=sen.strip().split('/')[0]
         m.append(w)
return m


import nltk
rp = open("tesu.txt", 'r')
text = rp.read()
list = []
sentences = splitParagraph(text)
for s in sentences:
 list.append(s)

来自'tesu.txt'的示例输入

Several/ap defendants/nns in/in the/at Summerdale/np police/nn burglary/nn trial/nn      made/vbd statements/nns indicating/vbg their/pp$ guilt/nn at/in the/at.... 

Bellows/np made/vbd the/at disclosure/nn when/wrb he/pps asked/vbd Judge/nn-tl Parsons/np to/to grant/vb his/pp$ client/nn ,/, Alan/np Clements/np ,/, 30/cd ,/, a/at separate/jj trial/nn ./.

如何从段落中提取所有标记的专有名词？

Answer 1

感谢您提供数据样本。

你需要：

阅读每个段落/行
按空格分割行以提取每个标记的单词，例如Summerdale/np
将该字词拆分为/以查看其是否已标记为np
如果是，请将分割的另一半（实际单词）添加到名词列表

如下所示（基于 Bogdan 的回答，谢谢！）

def noun(word):
    nouns = []
    for word in sentence.split():
      word, tag = word.split('/')
      if (tag.lower() == 'np'):
        nouns.append(word);
    return nouns

if __name__ == '__main__':
    nouns = []
    with open('tesu.txt', 'r') as file_p:
         for sentence in file_p.read().split('\n\n'): 
              result = noun(sentence)
              if result:
                   nouns.extend(result)
    print nouns

对于您的示例数据，产生：

['Summerdale', 'Bellows', 'Parsons', 'Alan', 'Clements']

更新：事实上，您可以缩短整个过程：

nouns = []
with open('tesu.txt', 'r') as file_p:
  for word in file_p.read().split(): 
    word, tag = word.split('/')
    if (tag.lower() == 'np'):
      nouns.append(word)
print nouns

如果你不关心名词来自哪个段落。

如果标签总是小写，你也可以摆脱.lower()。

Answer 2

您应该处理您的代码风格。我认为在那里有很多不必要的循环。您在splitParagraph中也有一个不必要的方法，它基本上只调用已存在的split方法，而您import re但后来从不使用它。同样认同您的代码，很难遵循这种方式。您应该提供"tesu.txt"输入的示例，以便我们为您提供更多帮助。无论如何，你所有的代码都可以压缩成：

 def noun(sentence);
    word, tag = sentence.split('/')
    if (tag.lower().startswith('np') and tag.lower().endswith('np')):
         return word
    return False

if __name__ == '__main__'
    words = []
    with open('tesu.txt', 'r') as file_p:
         for sentence in file_p.read().split('\n\n'): 
              result = noun(sentence)
              if result:
                   words.append(result)

使用python的块

2 个答案: