我指的是该部分中的链接 http://www.nltk.org/book/ch06.html#recognizing-textual-entailment
def rte_features(rtepair):
extractor = nltk.RTEFeatureExtractor(rtepair)
features = {}
features['word_overlap'] = len(extractor.overlap('word'))
features['word_hyp_extra'] = len(extractor.hyp_extra('word'))
features['ne_overlap'] = len(extractor.overlap('ne'))
features['ne_hyp_extra'] = len(extractor.hyp_extra('ne'))
return features
rtepair = nltk.corpus.rte.pairs(['rte3_dev.xml'])
extractor = nltk.RTEFeatureExtractor(rtepair)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-39-a7f96e33ba9e> in <module>()
----> 1 extractor = nltk.RTEFeatureExtractor(rtepair)
C:\Users\RAVINA\Anaconda2\lib\site-packages\nltk\classify\rte_classify.pyc in __init__(self, rtepair, stop, lemmatize)
65
66 #Get the set of word types for text and hypothesis
---> 67 self.text_tokens = tokenizer.tokenize(rtepair.text)
68 self.hyp_tokens = tokenizer.tokenize(rtepair.hyp)
69 self.text_words = set(self.text_tokens)
AttributeError: 'list' object has no attribute 'text'
本书中提到的确切代码,任何人都可以帮我解决这里出错的问题。 谢谢 Ravina
答案 0 :(得分:0)
看看类型签名。在python shell中输入:
import nltk
x = nltk.corpus.rte.pairs(['rte3_dev.xml'])
type(x)
告诉你x
属于类型列表。
现在,输入:
help(nltk.RTEFeatureExtractor)
告诉你:
:param rtepair:应从中提取要素的
RTEPair
显然,x
没有正确的类型来调用nltk.RTEFeatureExtractor
。代替:
type(x[33])
<class 'nltk.corpus.reader.rte.RTEPair'>
列表中的单个项目的类型正确。
<强>更新强>
正如评论部分所述,extractor.text_words
仅显示空字符串。这似乎是由于自编写文档以来在NLTK中所做的更改。长话短说:如果不降级到旧版本的NLTK或者自己修复NLTK中的问题,你将无法解决这个问题。
在文件nltk/classify/rte_classify.py
内,您将找到以下代码:
class RTEFeatureExtractor(object):
…
import nltk
from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer('([A-Z]\.)+|\w+|\$[\d\.]+')
self.text_tokens = tokenizer.tokenize(rtepair.text)
self.text_words = set(self.text_tokens)
如果你使用提取器中的确切文本运行相同的RegexpTokenizer
,它将只生成空字符串:
import nltk
rtepair = nltk.corpus.rte.pairs(['rte3_dev.xml'])[33]
from nltk.tokenize import RegexpTokenizer
tokenizer = RegexpTokenizer('([A-Z]\.)+|\w+|\$[\d\.]+')
tokenizer.tokenize(rtepair.text)
返回['', '', …, '']
(即空字符串列表)。