我有一个功能定义 -
现在我必须创建一个这样的函数 -
问题是因为(t,c)(其中t是特征,c是类)有4种组合,它们可以出现(t,c),(t',c),(t) ,c'),(t',c')。因此,根据t,c的值,函数定义也会发生变化。 除了计算a,b,c,d 4次然后对函数值求和之外,还有什么方法吗?
数据集如下所示 -
feature file_frequency_M file_frequency_B
abc 2 5
我的尝试 -
dataset = pd.read_csv('.csv')
score = []
###list =[(t,c) ,(t,c0),(t0,c),(t0,c0)] ##representation of the combination of (t,c)
l=152+1394
for index, row in dataset.iterrows():
a = row['file_frequency_M']
b = row['file_frequency_B']
c = 152 - a
d = 1394 - b
temp_score = 0
tmp1 = 0
tmp2 = 0
tmp3 = 0
tmp4 = 0
for i in range(4):
if i == 0:
if a == 0:
tmp1 = 0
else:
tmp1 = log10(((a * l) / (a + c) * (a + b)))
temp_score += tmp1
if i == 1:
if b == 0:
tmp2 = 0
else:
tmp2 = log10(((b * l) / (b + d) * (b + a)))
temp_score += tmp2
if i == 2:
if c == 0:
tmp3 = 0
else:
tmp3 = log10(((c * l) / (c + a) * (c + d)))
temp_score += tmp3
if i == 3:
if d == 0:
tmp4 = 0
else:
tmp4 = log10(((d * l) / (d + b) * (d + c)))
temp_score += tmp4
score.append(temp_score)
np.savetxt("m.csv", score, delimiter=",")
答案 0 :(得分:2)
通过创建I(t,c)
的函数表示,可以节省很多代码重复:
import numpy as np
import pandas as pd
from math import log10
dataset = pd.read_csv('.csv')
score = []
###list =[(t,c) ,(t,c0),(t0,c),(t0,c0)] ##representation of the combination of (t,c)
l=152+1394
def I(a,b,c,n):
"""Returns I(t,c) = A*N/((A+C)*(A+B))"""
if a == 0:
return 0
return log10((a * n) / ((a + c) * (a + b)))
for index, row in dataset.iterrows():
a = row['file_frequency_M']
b = row['file_frequency_B']
c = 152 - a
d = 1394 - b
tmp1 = I(a,b,c,l)
tmp2 = I(b,a,d,l)
tmp3 = I(c,d,a,l)
tmp4 = I(d,c,b,l)
temp_score = sum(tmp1,tmp2,tmp3,tmp4)
score.append(temp_score)
np.savetxt("m.csv", score, delimiter=",")
注意:根据您函数定义的图像,您的代码中似乎有一个错误,应该是:
log10((a * n) / ((a + c) * (a + b)))
不是
log10(((a * l) / (a + c) * (a + b)))
(请注意括号的位置)。