如何重构重复的代码

时间:2017-04-30 23:25:10

标签: python gensim

我有两个函数只有一行不同,所以为了避免代码重复,我想创建一个具有这些函数的一般形式的基类,然后为每个类继承它。

功能1:

def top_similar_traces(self, stack_trace, top=10):
        words_to_test = StackTraceProcessor.preprocess(stack_trace)
        words_to_test_clean = [w for w in np.unique(words_to_test).tolist() if w in model]

        # Cos-similarity
        all_distances = np.array(1.0 - np.dot(model.wv.syn0norm, model.wv.syn0norm[
            [model.wv.vocab[word].index for word in words_to_test_clean]].transpose()), dtype=np.double)

        for i, (doc_id, rwmd_distance) in enumerate(distances):

            doc_words_clean = [w for w in self.corpus[doc_id] if w in model]
            wmd = self.wmdistance(model, words_to_test_clean, doc_words_clean, all_distances)

        return sorted(similarities, key=lambda v: v[1])[:top]

功能2:

def top_similar_traces(self, stack_trace, top=10):
        words_to_test = StackTraceProcessor.preprocess(stack_trace)
        words_to_test_clean = [w for w in np.unique(words_to_test).tolist() if w in model]

        # Cos-similarity
        all_distances = np.array(1.0 - np.dot(model.wv.syn0norm, model.wv.syn0norm[
            [model.wv.vocab[word].index for word in words_to_test_clean]].transpose()), dtype=np.double)

        for i, (doc_id, rwmd_distance) in enumerate(distances):

            doc_words_clean = [w for w in self.corpus[doc_id].words if w in model]
            wmd = self.wmdistance(model, words_to_test_clean, doc_words_clean, all_distances)

        return sorted(similarities, key=lambda v: v[1])[:top]

你可以看到唯一的区别在于

        doc_words_clean = [w for w in self.corpus[doc_id].words if w in model]
        doc_words_clean = [w for w in self.corpus[doc_id] if w in model]

3 个答案:

答案 0 :(得分:1)

您可以在超类中定义函数,如:

def top_similar_traces(self, stack_trace, t, top=10):
    words_to_test = StackTraceProcessor.preprocess(stack_trace)
    words_to_test_clean = [w for w in np.unique(words_to_test).tolist() if w in model]

    # Cos-similarity
    all_distances = np.array(1.0 - np.dot(model.wv.syn0norm, model.wv.syn0norm[
        [model.wv.vocab[word].index for word in words_to_test_clean]].transpose()), dtype=np.double)

    for i, (doc_id, rwmd_distance) in enumerate(distances):

        if t=="something":
            doc_words_clean = [w for w in self.corpus[doc_id] if w in model]
        else:
            doc_words_clean = [w for w in self.corpus[doc_id].words if w in model]
        wmd = self.wmdistance(model, words_to_test_clean, doc_words_clean, all_distances)

    return sorted(similarities, key=lambda v: v[1])[:top]

其中t是一个做出所需决定的字符串,然后你应该从你的子类中调用这个方法,如:

def top_similar_traces(self, stack_trace, top=10):
    return super().top_similar_traces(stack_trace, "option", top)

这样的解决方案应该有效。 t可以是任何类型的变量(整数,字符串等)

答案 1 :(得分:1)

只需将更改部分提取到单独的方法中即可。这样,基类可以覆盖该部分并影响原始方法,而不必复制整个代码。

这样的事情:

# Base class
def top_similar_traces(self, stack_trace, top=10):
    words_to_test = StackTraceProcessor.preprocess(stack_trace)
    words_to_test_clean = [w for w in np.unique(words_to_test).tolist() if w in model]

    # Cos-similarity
    all_distances = np.array(1.0 - np.dot(model.wv.syn0norm, model.wv.syn0norm[
        [model.wv.vocab[word].index for word in words_to_test_clean]].transpose()), dtype=np.double)

    for i, (doc_id, rwmd_distance) in enumerate(distances):
        # call another method here
        doc_words_clean = self.top_similar_traces_filter_words(doc_id)
        wmd = self.wmdistance(model, words_to_test_clean, doc_words_clean, all_distances)

    return sorted(similarities, key=lambda v: v[1])[:top]

# Subclass A
def top_similar_traces_filter_words(self, doc_id):
    return [w for w in self.corpus[doc_id].words if w in model]

# Subclass B
def top_similar_traces_filter_words(self, doc_id):
    return [w for w in self.corpus[doc_id] if w in model]

顺便说一下。我不知道你的model来自哪里,但它似乎是一个全局变量。您应该避免这种情况,而是将其放入您的班级(或将其传入)。

答案 2 :(得分:1)

你提到" ...我想用这些函数的一般形式创建一个基类,然后为每个类继承它。"

我想指出,没有必要为此创建一个类。使用单个函数可以正常工作。在以下示例中,我添加了第四个名为words的参数,并将值设置为True。如果将其保留为True,则该函数将使用您检查self.corpus[doc_id].words的行。如果您使用False调用该函数,它将使用您检查self.corpus[doc_id]的行。

def top_similar_traces(self, stack_trace, top=10, words=True):
    words_to_test = StackTraceProcessor.preprocess(stack_trace)
    words_to_test_clean = [w for w in np.unique(words_to_test).tolist() if w in model]

    # Cos-similarity
    all_distances = np.array(1.0 - np.dot(model.wv.syn0norm, model.wv.syn0norm[[model.wv.vocab[word].index for word in words_to_test_clean]].transpose()), dtype=np.double)

    for i, (doc_id, rwmd_distance) in enumerate(distances):
        if words == True:
            doc_words_clean = [w for w in self.corpus[doc_id].words if w in model]
        else:
            doc_words_clean = [w for w in self.corpus[doc_id] if w in model]
        wmd = self.wmdistance(model, words_to_test_clean, doc_words_clean, all_distances)

     return sorted(similarities, key=lambda v: v[1])[:top]

要使用该函数检查self.corpus [doc_id] .words,请按以下方式调用:

top_similar_traces(<stack_trace>)

要使用该函数检查self.corpus [doc_id],请按以下方式调用:

top_similar_traces(<stack_trace>, words=False)