Question

我需要编写一个程序来进行文本去词皮化（单词的不同形式）。由于我将使用不同的词库化库并进行比较，因此我决定使用策略模式。

我的想法是将所有内容包装到单个类中，并且根据lemmatization函数，仅更改我的lemmatize方法。

这是我的课程：

import re
import types

create_bound_method = types.MethodType

class Lemmatizator(object):
def __init__(self, filename=None, lemmatization=None):
    if lemmatization and filename:
        self.filename = filename
        self.lemmatize = create_bound_method(lemmatization, self)

def _get_text(self):
    with open(f'texts/{self.filename}.txt', 'r') as file:
        self.text = file.read()

def _split_to_unique(self):
    text = re.sub(r'[^\w\s]', '', self.text)
    split_text = re.split(r'\s', text)

    self.unique_words = set(split_text)

    return self.unique_words

def lemmatize(self):
    return 'Lemmatize function or text are not found'

然后我要创建我的lemmatize方法：

def nltk_lemmatization(self):
words = {}

for word in self.unique_words:
    if word:
        words[word] = {
            'noun': wnl.lemmatize(word),
            'adverb': wnl.lemmatize(word, pos='r'),
            'adjective': wnl.lemmatize(word, pos='a'),
            'verb': wnl.lemmatize(word, pos='v')
        }

return words

并尝试应用它：

nltk_lem = Lemmatizator('A Christmas Carol in Prose', nltk_lemmatization)
nltk_lem.lemmatize()

但是我收到以下错误：

for word in self.unique_words:
AttributeError：“ Lemmatizator”对象没有属性“ unique_words”

怎么了？

Answer 1

据我所知，self.unique_words仅添加到_split_to_unique(self)函数中的类中。因此，当您调用nltk_lemmatization(self)时，尚未调用_split_to_unique(self)，因此，它会尝试遍历不存在的内容。

应用策略设计模式

1 个答案: