我需要帮助减少以下代码的圈复杂度:
def avg_title_vec(record, lookup):
avg_vec = []
word_vectors = []
for tag in record['all_titles']:
titles = clean_token(tag).split()
for word in titles:
if word in lookup.value:
word_vectors.append(lookup.value[word])
if len(word_vectors):
avg_vec = [
float(val) for val in numpy.mean(
numpy.array(word_vectors),
axis=0)]
output = (record['id'],
','.join([str(a) for a in avg_vec]))
return output
示例输入:
record ={'all_titles': ['hello world', 'hi world', 'bye world']}
lookup.value = {'hello': [0.1, 0.2], 'world': [0.2, 0.3], 'bye': [0.9, -0.1]}
def clean_token(input_string):
return input_string.replace("-", " ").replace("/", " ").replace(
":", " ").replace(",", " ").replace(";", " ").replace(
".", " ").replace("(", " ").replace(")", " ").lower()
所以在lookup.value中出现的所有单词,我都是他们的矢量形式的平均值。
答案 0 :(得分:0)
实际上它可能并不算作正确的答案,因为最终圈复的复杂性并没有减少。
这个变体有点短,但我看不出它可以被推广的任何方式。而且你似乎需要你拥有的那些if
。
def avg_title_vec(record, lookup):
word_vectors = [lookup.value[word] for tag in record['all_titles']
for word in clean_token(tag).split() if word in lookup.value]
if not word_vectors:
return (record['id'], None)
avg_vec = [float(val) for val in numpy.mean(
numpy.array(word_vectors),
axis=0)]
output = (record['id'],
','.join([str(a) for a in avg_vec]))
return output
根据this,你的CC是6,已经很好了。您可以通过使用辅助函数来减少函数的CC,例如
def get_tags(record):
return [tag for tag in record['all_titles']]
def sanitize_and_split_tags(tags):
return [word for tag in tags for word in
re.sub(r'[\-/:,;\.()]', ' ', tag).lower().split()]
def get_vectors_words(words):
return [lookup.value[word] for word in words if word in lookup.value]
它将降低平均CC,但整体CC将保持不变或增加。我不知道如何摆脱if
检查单词是否在lookup.value
中还是检查我们是否有任何可以使用的向量。