Pandas series.apply出现OverFlow错误

时间:2015-10-26 09:43:11

标签: python pandas integer-division

我有一个适用于单个值的函数,但是当我将它与pandas series.apply()一起使用时,它会产生OverflowError。

from __future__ import division
import pandas as pd
import numpy as np

birthdays = pd.DataFrame(np.empty([365,2]), columns = ['k','probability'], index = range(1,366))
birthdays['k'] = birthdays.index

我做了一个功能:

def probability_of_shared_bday(k):
    end_point = 366 - k
    numerator = 1
    for i in range(end_point, 366):
        numerator = numerator*i
    denominator = 365**k
    probability_of_no_match = (1 - numerator/denominator)
    return probability_of_no_match

当我尝试使用单个整数时,它可以正常工作:

 probability_of_shared_bday(1)

0.0

 probability_of_shared_bday(100)

0.9999996927510721

但是当我尝试使用此函数时,请使用:

birthdays['probability'] = birthdays['k'].apply(probability_of_shared_bday, convert_dtype=False)

OverflowError:对于float

,整数除法结果太大

无论convert_dtype是True还是False,都会发生这种情况。

检查birthdays['k'].dtypes我得到dtype('int64')

1 个答案:

答案 0 :(得分:1)

我不确定你为什么会遇到apply这个问题,但是你不应该像开始那样编写函数。这是一个建议,避免将两个巨大的数字分开:

def probability_of_shared_bday(k):
    end_point = 366 - k
    ratio = 1
    for i in range(end_point, 366):
        ratio *= i / 365
    probability_of_no_match = (1 - ratio)
    return probability_of_no_match

问题消失了!