在稍微不同的函数中重复编写代码

时间:2015-02-17 13:33:56

标签: python code-duplication code-design

我有两个非常相似的循环,这两个循环包含一个非常类似于第三个循环的内循环(呃... :))。用代码说明它看起来很接近:

# First function
def fmeasure_kfold1(array, nfolds):
    ret = []

    # Kfold1 and kfold2 both have this outer loop
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        for build in array[test_index]:  # <- All functions have this loop

            # Retrieved tests is calculated inside the build loop in kfold1
            retrieved_tests = get_tests(set(build['modules']), correlation)

            relevant_tests = set(build['tests'])
            fval = calc_f(relevant_tests, retrieved_tests)
            if fval is not None:
                ret.append(fval)

    return ret

# Second function
def fmeasure_kfold2(array, nfolds):
    ret = []

    # Kfold1 and kfold2 both have this outer loop
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        # Retrieved tests is calculated outside the build loop in kfold2
        retrieved_tests = _sum_tests(correlation)

        for build in array[test_index]:  # <- All functions have this loop

            relevant_tests = set(build['tests'])
            fval = calc_f(relevant_tests, retrieved_tests)
            if fval is not None:
                ret.append(fval)

    return ret

# Third function
def fmeasure_all(array):
    ret = []
    for build in array:  # <- All functions have this loop

        relevant = set(build['tests'])
        fval = calc_f2(relevant)  # <- Instead of calc_f, I call calc_f2
        if fval is not None:
            ret.append(fval)

    return ret

前两个函数的方式不同,在什么时间计算retrieved_tests。第三个函数与前两个函数的内部循环不同,它调用calc_f2,而不使用retrieved_tests

实际上代码更复杂,但是复制让我烦恼,我想我可以忍受它。然而,最近我一直在对它进行更改,并且必须立即在两三个地方进行更改,这很烦人。

是否有合并重复代码的好方法?我能想到的唯一方法就是引入类,它引入了大量的样板,并且如果可能的话我希望将函数保持为纯函数。


修改

这是calc_fcalc_f2

的内容
def calc_f(relevant, retrieved):
    """Calculate the F-measure given relevant and retrieved tests."""
    recall = len(relevant & retrieved)/len(relevant)
    prec = len(relevant & retrieved)/len(retrieved)
    fmeasure = f_measure(recall, prec)

    return (fmeasure, recall, prec)


def calc_f2(relevant, nbr_tests=1000):
    """Calculate the F-measure given relevant tests."""
    recall = 1
    prec = len(relevant) / nbr_tests
    fmeasure = f_measure(recall, prec)

    return (fmeasure, recall, prec)

f_measure计算harmonic mean精度并召回。

基本上,calc_f2需要很多快捷方式,因为不需要检索测试。

4 个答案:

答案 0 :(得分:0)

使用一个通用函数来控制计算retrieved_tests的位置的额外参数也可以。

e.g。

def fmeasure_kfold_generic(array, nfolds, mode):
    ret = []

    # Kfold1 and kfold2 both have this outer loop
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        # Retrieved tests is calculated outside the build loop in kfold2
        if mode==2:
            retrieved_tests = _sum_tests(correlation)

        for build in array[test_index]:  # <- All functions have this loop
            # Retrieved tests is calculated inside the build loop in kfold1
            if mode==1:
                retrieved_tests = get_tests(set(build['modules']), correlation)

            relevant_tests = set(build['tests'])
            fval = calc_f(relevant_tests, retrieved_tests)
            if fval is not None:
                ret.append(fval)

答案 1 :(得分:0)

一种方法是将每个内部循环编写为函数,然后将外部循环作为一个单独的函数接收其他函数作为参数。这与排序函数(接收应该用于比较两个元素的函数)所做的事情很接近。

当然,困难的部分是找到所有功能之间的共同部分,这并不总是很简单。

答案 2 :(得分:0)

典型的解决方案是识别算法的各个部分并使用Template method design pattern,其中不同的阶段将在子类中实现。我根本不理解您的代码,但我认为会有像makeGlobalRetrievedTests()makeIndividualRetrievedTests()这样的方法吗?

答案 3 :(得分:0)

我从内到外解决问题:通过分解最内层的循环。这适用于&#39;功能性&#39;风格(以及&#39;函数式编程&#39;)。在我看来,如果你稍微概括fmeasure_all,就可以实现所有三个函数。像

这样的东西
def fmeasure(builds, calcFn, retrieveFn):
    ret = []
    for build in array:
        relevant = set(build['tests'])
        fval = calcFn(relevant, retrieveFn(build))
        if fval is not None:
            ret.append(fval)

    return ret

这允许您定义:

def fmeasure_kfold1(array, nfolds):
    ret = []

    # Kfold1 and kfold2 both have this outer loop
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        ret += fmeasure(array[test_index], calc_f,
                        lambda build: get_tests(set(build['modules']), correlation))

    return ret


def fmeasure_kfold2(array, nfolds):
    ret = []

    # Kfold1 and kfold2 both have this outer loop
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])

        # Retrieved tests is calculated outside the build loop in kfold2
        retrieved_tests = _sum_tests(correlation)

        ret += fmeasure(array[test_index], calc_f, lambda _: retrieved_tests)

    return ret


def fmeasure_all(array):
    return fmeasure(array,
                    lambda relevant, _: calc_f2(relevant),
                    lambda x: x)

到目前为止,fmeasure_kfold1fmeasure_kfold2看起来非常类似。它们在fmeasure的调用方式上大不相同,因此我们可以实现一个通用的fmeasure_kfoldn函数,它集中了迭代并收集结果:

def fmeasure_kfoldn(array, nfolds, callable):
    ret = []
    for train_index, test_index in KFold(len(array), nfolds):
        correlation = analyze(array[train_index])
        ret += callable(array[test_index], correlation)
    return ret

这样可以非常轻松地定义fmeasure_kfold1fmeasure_kfold2

def fmeasure_kfold1(array, nfolds):
    def measure(builds, correlation):
        return fmeasure(builds, calc_f, lambda build: get_tests(set(build['modules']), correlation))
    return fmeasure_kfoldn(array, nfolds, measure)


def fmeasure_kfold2(array, nfolds):
    def measure(builds, correlation):
        retrieved_tests = _sum_tests(correlation)
        return fmeasure(builds, calc_f, lambda _: retrieved_tests)
    return fmeasure_kfoldn(array, nfolds, measure)