我正在使用pandas进行高性能计算,下面的函数为50,000行提供 1循环,最佳5:7.24 s每循环。
我必须将它扩展到100万行。
如何向量化该函数并应用于所有行。那么总体性能可以提高吗?
The conversion of the nvarchar value '17191925814' overflowed an int column.
答案 0 :(得分:3)
我认为你可以删除apply
并使用矢量化函数:
mutatedCashFlow['startDate'] = pd.to_datetime(mutatedCashFlow['startDate'])
mutatedCashFlow['EndDate'] = pd.to_datetime(mutatedCashFlow['EndDate'])
mutatedCashFlow['tradeDate'] = pd.to_datetime(mutatedCashFlow['tradeDate'])
diffTradeAndEnd=((mutatedCashFlow['EndDate']-mutatedCashFlow['tradeDate']).dt.days).abs()
diffStartAndEnd=((mutatedCashFlow['EndDate']-mutatedCashFlow['startDate']).dt.days).abs()
mutatedCashFlow['flow'] = (mutatedCashFlow['tradeAmount']*diffTradeAndEnd)/diffStartAndEnd
替代:
mutatedCashFlow['startDate'] = pd.to_datetime(mutatedCashFlow['startDate'])
mutatedCashFlow['EndDate'] = pd.to_datetime(mutatedCashFlow['EndDate'])
mutatedCashFlow['tradeDate'] = pd.to_datetime(mutatedCashFlow['tradeDate'])
diffTradeAndEnd=mutatedCashFlow['EndDate'].sub(mutatedCashFlow['tradeDate']).dt.days.abs()
diffStartAndEnd=mutatedCashFlow['EndDate'].sub(mutatedCashFlow['startDate']).dt.days.abs()
mutatedCashFlow['flow'] = mutatedCashFlow['tradeAmount'].mul(diffTradeAndEnd)
.div(diffStartAndEnd)
print (mutatedCashFlow)