我需要数一数困惑,然后尝试用
def get_perplexity(test_set, model):
perplexity = 1
n = 0
for word in test_set:
n += 1
perplexity = perplexity * 1 / get_prob(model, word)
perplexity = pow(perplexity, 1/float(n))
return perplexity
经过一些步骤,我的perplexity
等于无穷大。
我需要获取编号,并作为最后一步来进行pow(perplexity, 1/float(n))
是否可以将数字相乘并得到结果?
3.887311155784627e+243
8.311806360146177e+250
1.7707049372801292e+263
1.690802669602979e+271
3.843294667766984e+278
5.954424789834101e+290
8.859529887856071e+295
7.649470766862909e+306
答案 0 :(得分:1)
重复乘法将导致一些棘手的数值不稳定,因为乘法结果需要越来越多的位来表示。我建议您将其转换为对数空间,并使用求和而不是乘法:
import math
def get_perplexity(test_set, model):
log_perplexity = 0
n = 0
for word in test_set:
n += 1
log_perplexity -= math.log(get_prob(model, word))
log_perplexity /= float(n)
return math.exp(log_perplexity)
这样,您的所有对数都可以用标准位数表示,并且不会出现数值放大和精度损失的情况。另外,您可以使用decimal
模块来引入任意精度:
import decimal
def get_perplexity(test_set, model):
with decimal.localcontext() as ctx:
ctx.prec = 100 # set as appropriate
log_perplexity = decimal.Decimal(0)
n = 0
for word in test_set:
n += 1
log_perplexity -= decimal.Decimal(get_prob(model, word))).ln()
log_perplexity /= float(n)
return log_perplexity.exp()
答案 1 :(得分:0)
由于e + 306仅为10 ^ 306,因此您可以将课程分为两部分
class BigPowerFloat:
POWER_STEP = 10**100
def __init__(self, input_value):
self.value = float(input_value)
self.power = 0
def _move_to_power(self):
while self.value > self.POWER_STEP:
self.value = self.value / self.POWER_STEP
self.power += self.POWER_STEP
# you can add similar for negative values
def __mul__(self, other):
self.value *= other
self._move_to_power()
# TODO other __calls for /, +, - ...
def __str__(self):
pass
# make your cust to str