我有两个非常相似的循环,这两个循环包含一个非常类似于第三个循环的内循环(呃... :))。用代码说明它看起来很接近:
# First function
def fmeasure_kfold1(array, nfolds):
ret = []
# Kfold1 and kfold2 both have this outer loop
for train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
for build in array[test_index]: # <- All functions have this loop
# Retrieved tests is calculated inside the build loop in kfold1
retrieved_tests = get_tests(set(build['modules']), correlation)
relevant_tests = set(build['tests'])
fval = calc_f(relevant_tests, retrieved_tests)
if fval is not None:
ret.append(fval)
return ret
# Second function
def fmeasure_kfold2(array, nfolds):
ret = []
# Kfold1 and kfold2 both have this outer loop
for train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
# Retrieved tests is calculated outside the build loop in kfold2
retrieved_tests = _sum_tests(correlation)
for build in array[test_index]: # <- All functions have this loop
relevant_tests = set(build['tests'])
fval = calc_f(relevant_tests, retrieved_tests)
if fval is not None:
ret.append(fval)
return ret
# Third function
def fmeasure_all(array):
ret = []
for build in array: # <- All functions have this loop
relevant = set(build['tests'])
fval = calc_f2(relevant) # <- Instead of calc_f, I call calc_f2
if fval is not None:
ret.append(fval)
return ret
前两个函数的方式不同,在什么时间计算retrieved_tests
。第三个函数与前两个函数的内部循环不同,它调用calc_f2
,而不使用retrieved_tests
。
实际上代码更复杂,但是复制让我烦恼,我想我可以忍受它。然而,最近我一直在对它进行更改,并且必须立即在两三个地方进行更改,这很烦人。
是否有合并重复代码的好方法?我能想到的唯一方法就是引入类,它引入了大量的样板,并且如果可能的话我希望将函数保持为纯函数。
修改
这是calc_f
和calc_f2
:
def calc_f(relevant, retrieved):
"""Calculate the F-measure given relevant and retrieved tests."""
recall = len(relevant & retrieved)/len(relevant)
prec = len(relevant & retrieved)/len(retrieved)
fmeasure = f_measure(recall, prec)
return (fmeasure, recall, prec)
def calc_f2(relevant, nbr_tests=1000):
"""Calculate the F-measure given relevant tests."""
recall = 1
prec = len(relevant) / nbr_tests
fmeasure = f_measure(recall, prec)
return (fmeasure, recall, prec)
f_measure
计算harmonic mean精度并召回。
基本上,calc_f2
需要很多快捷方式,因为不需要检索测试。
答案 0 :(得分:0)
使用一个通用函数来控制计算retrieved_tests
的位置的额外参数也可以。
e.g。
def fmeasure_kfold_generic(array, nfolds, mode):
ret = []
# Kfold1 and kfold2 both have this outer loop
for train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
# Retrieved tests is calculated outside the build loop in kfold2
if mode==2:
retrieved_tests = _sum_tests(correlation)
for build in array[test_index]: # <- All functions have this loop
# Retrieved tests is calculated inside the build loop in kfold1
if mode==1:
retrieved_tests = get_tests(set(build['modules']), correlation)
relevant_tests = set(build['tests'])
fval = calc_f(relevant_tests, retrieved_tests)
if fval is not None:
ret.append(fval)
答案 1 :(得分:0)
一种方法是将每个内部循环编写为函数,然后将外部循环作为一个单独的函数接收其他函数作为参数。这与排序函数(接收应该用于比较两个元素的函数)所做的事情很接近。
当然,困难的部分是找到所有功能之间的共同部分,这并不总是很简单。
答案 2 :(得分:0)
典型的解决方案是识别算法的各个部分并使用Template method design pattern,其中不同的阶段将在子类中实现。我根本不理解您的代码,但我认为会有像makeGlobalRetrievedTests()
和makeIndividualRetrievedTests()
这样的方法吗?
答案 3 :(得分:0)
我从内到外解决问题:通过分解最内层的循环。这适用于&#39;功能性&#39;风格(以及&#39;函数式编程&#39;)。在我看来,如果你稍微概括fmeasure_all
,就可以实现所有三个函数。像
def fmeasure(builds, calcFn, retrieveFn):
ret = []
for build in array:
relevant = set(build['tests'])
fval = calcFn(relevant, retrieveFn(build))
if fval is not None:
ret.append(fval)
return ret
这允许您定义:
def fmeasure_kfold1(array, nfolds):
ret = []
# Kfold1 and kfold2 both have this outer loop
for train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
ret += fmeasure(array[test_index], calc_f,
lambda build: get_tests(set(build['modules']), correlation))
return ret
def fmeasure_kfold2(array, nfolds):
ret = []
# Kfold1 and kfold2 both have this outer loop
for train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
# Retrieved tests is calculated outside the build loop in kfold2
retrieved_tests = _sum_tests(correlation)
ret += fmeasure(array[test_index], calc_f, lambda _: retrieved_tests)
return ret
def fmeasure_all(array):
return fmeasure(array,
lambda relevant, _: calc_f2(relevant),
lambda x: x)
到目前为止,fmeasure_kfold1
和fmeasure_kfold2
看起来非常类似。它们在fmeasure
的调用方式上大不相同,因此我们可以实现一个通用的fmeasure_kfoldn
函数,它集中了迭代并收集结果:
def fmeasure_kfoldn(array, nfolds, callable):
ret = []
for train_index, test_index in KFold(len(array), nfolds):
correlation = analyze(array[train_index])
ret += callable(array[test_index], correlation)
return ret
这样可以非常轻松地定义fmeasure_kfold1
和fmeasure_kfold2
:
def fmeasure_kfold1(array, nfolds):
def measure(builds, correlation):
return fmeasure(builds, calc_f, lambda build: get_tests(set(build['modules']), correlation))
return fmeasure_kfoldn(array, nfolds, measure)
def fmeasure_kfold2(array, nfolds):
def measure(builds, correlation):
retrieved_tests = _sum_tests(correlation)
return fmeasure(builds, calc_f, lambda _: retrieved_tests)
return fmeasure_kfoldn(array, nfolds, measure)