我正在尝试编写一个用于文本操作的类。这个想法是类将支持基本的文本预处理,但是如果有人想编写一个非常复杂的预处理功能,则他们应该能够使用基类并覆盖它。我尝试了以下方法,即使我可以通过某种方式使其起作用,但我认为这不是正确的方法。
class TextPreprocessor:
def __init__(self, corpus):
"""Text Preprocessor base class.
corpus: a list of sentences
"""
self.corpus = corpus
self.word_tokens = [self.preprocess(sentence) for sentence in corpus]
def preprocess(self,sentence):
"""
strip each sentence , lowercase it and split by space # sentence.strip().lower().split()
"""
return sentence.strip().lower().split()
def preprocess_transform(self,sentence):
return self.preprocess(sentence)
现在,如果我想编写一个新的预处理功能,这是最好的方法。我尝试关注,
class SubPreprocess(TextPreprocessor):
def __init__(self, corpus):
#### dummy preprocess function
def preprocess(self, sentence):
return sentence.strip().split() + ['HELLOOOOOOOOOOLLLL']
super.__init__(corpus)
它不起作用。我基本上想要的是预处理功能(经过修改),应该能够覆盖基类TextPreprocessor
中的那个,以便在调用__init__
时,self.word_tokens
应该基于在新的预处理功能上
答案 0 :(得分:5)
将执行以下操作:
class SubPreprocess(TextPreprocessor):
def preprocess(self, sentence):
return sentence.strip().split() + ['HELLOOOOOOOOOOLLLL']
如果您现在调用SubPreprocess
的构造函数,则将使用新的preprocess
方法:
proc = SubPreprocess(some_corpus)
# looks up any methods in the mro of SubPreprocess
答案 1 :(得分:2)
class SubPreprocess(TextPreprocessor):
def __init__(self, corpus):
#this is how you initialise the superclass
super(SubPreprocess, self).__init__(corpus)
# the overridden function should be within the scope of the class, not under the initializer
def preprocess(self, sentence):
return sentence.strip().split() + ['HELLOOOOOOOOOOLLLL']
答案 2 :(得分:0)
如果要注入行为,只需使用一个函数:
class TheAlgorithm:
def __init__(self, preprocess):
self.preprocess = preprocess
def process(self, corpus):
after_a = self.part_a(corpus)
preprocessed = self.preprocess(after_a)
return self.part_b(preprocessed)
使用非常简单:
p = TheAlgorithm(lambda c: c.strip().split() + 'helllol')
p.process('the corpus')
实际上,如果您的课程仅存储一些函数,则可以进行全功能编程:
def processor(preprocess):
def algorithm(corpus):
return part_b( preprocess(corpus) )
p = processor(lambda c: "-".join(c.split(",")))
assert "a-b-c" == p("a,b,c")
答案 3 :(得分:0)
尝试更改:super。初始化(语料库) 改为super()。初始化(语料库)