我已延长this SO问题&正在比较两个乳胶方程。这是两个二次方程的例子。
eqn1 = "*=\frac{-*\pm\sqrt{*^2-4ac}}{2a}"
eqn2 = "x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}"
我需要比较这些是正确的,因为,而不是x,b,我已经使用了*。我所做的就是将方程式转换为单词列表。
eqn1_word = [*,frac,*,pm,sqrt,*,2,4ac,2a]
eqn2_word = [x,frac,b,pm, sqrt, b, 2, 4ac, 2a]
所以矢量是
eqn1_vec= Counter({'*': 3, 'frac': 1, 'sqrt': 1, '2': 1, '2a': 1, '4ac': 1, 'pm': 1})
eqn2_vec = Counter({'b': 2, 'frac': 1, 'sqrt': 1, '2': 1, '2a': 1, '4ac': 1, 'x': 1, 'pm': 1})
现在我的扩展名是我正在检查eqn1_word中*的百分比,然后检查that回答给出的正常余弦相似度。最后,我添加了两个值,几乎等于1。
这适用于大多数情况(如果一个变量被*替换)。对于eqn1_vec,*值为3,在eqn2_vec中,b = 2,x = 1。
更多说明&更好的理解请检查这一点。 从那个参考,我的代码是这样的。
def get_cosine(self, c_eqn1_eqn, c_eqn2_eqn):
print 'c_eqn1_eqn = ', c_eqn1_eqn
print 'c_eqn2_eqn = ', c_eqn2_eqn
_special_symbol = float(c_eqn1_eqn.count("*"))
cos_result = 0
symbol_percentage = 0
try:
eqn1_vector = Counter(self.get_word(c_eqn1_eqn))# get word will return word list
eqn2_vector = Counter(self.get_word(c_eqn2_eqn))
_words = sum([x for x in eqn1_vector.values()])
if eqn2_vector.has_key("*"):
_special_symbol -= eqn2_vector["*"]
print '_special_symbol = ', _special_symbol
print '_words @ last = ', _words
try:
symbol_percentage = _special_symbol / _words
except ZeroDivisionError:
symbol_percentage = 0.0
except Exception as exp:
print "Exception at converting equation to vector", exp
traceback.print_exc()
else:
intersection = set(eqn1_vector.keys()) & set(eqn2_vector.keys())
numerator = sum([eqn1_vector[x] * eqn2_vector[x] for x in intersection])
_sum1 = sum([eqn1_vector[x]**2 for x in eqn1_vector.keys()])
_sum2 = sum([eqn2_vector[x]**2 for x in eqn2_vector.keys()])
denominator = math.sqrt(_sum1) * math.sqrt(_sum2)
print 'numerator = ', numerator
print 'denominator = ', denominator
if not denominator:
cos_result = 0
else:
cos_result = float(numerator) / denominator
print cos_result
final_result = float(symbol_percentage) + cos_result
return final_result if final_result <= 1.0 else 1
问题是当交叉点值很小时,分子变小。我从班上复制了。请忽略自己。
如何解决这个问题。提前致谢。如果有任何错误或我的概念有误,请与我分享。
答案 0 :(得分:1)
我找到了解决这个问题的方法。
由于我们可以/不应该增加分子值,我决定改为分母。我的逻辑是,如果eqn2中的*和非交叉值的数量相同,则减小分母值。如果没有,那就放手吧。现在我不必计算&#34; *&#34;的百分比也没有在余弦结果中加入。
def get_cosine(c_eqn1, c_eqn2):
_special_symbol = float(c_eqn1.count("*"))
cos_result = 0
try:
eqn1_vector = Counter(get_word(c_eqn1))
eqn2_vector = Counter(get_word(c_eqn2))
_special_symbol = 0
spe_list = list()
# Storing number of * & the value contains *
for _val in eqn1_vector.keys():
if _val.__contains__("*"):
_special_symbol += eqn1_vector[_val]
spe_list.append(_val)
if eqn2_vector.has_key("*"):
_special_symbol -= eqn2_vector["*"]
except Exception as exp:
print "Exception at converting equation to vector", exp
traceback.print_exc()
else:
intersection = set(eqn1_vector.keys()) & set(eqn2_vector.keys())
numerator = sum([eqn1_vector[x] * eqn2_vector[x]
for x in intersection])
non_intersection_sum = 0
non_intersection_value = list()
# storing no of non_matched value
for _val in eqn2_vector.keys():
if _val not in intersection:
non_intersection_sum += eqn2_vector[_val]
non_intersection_value.append(_val)
# Join both non intercet lists
if non_intersection_value:
non_intersection_value.extend(spe_list)
# If both non intersect value are not same
# Empty the list
if _special_symbol != non_intersection_sum:
non_intersection_value = list()
# Cosine similarity formula
_sum1 = sum([eqn1_vector[x]**2 for x in eqn1_vector.keys() if x not in non_intersection_value])
_sum2 = sum([eqn2_vector[x]**2 for x in eqn2_vector.keys() if x not in non_intersection_value])
denominator = math.sqrt(_sum1) * math.sqrt(_sum2)
if not denominator:
cos_result = 0
else:
cos_result = float(numerator) / denominator
return cos_result if cos_result <= 1.0 else 1