熊猫数据框条件变化随迭代而变慢

时间:2020-06-25 20:01:13

标签: python pandas performance dataframe numpy-ndarray

我有一个数据框,可以显示饮食中不同食物的重量(wts)。有些食物可以掉进去掉(所以当掉掉时它们的重量变为NaN)

在某些时候,wt的总和> 1,因此我需要将其降低到1。为此,我需要检查自上次检查以来哪些食物的wt增加最大。我从显示最大增加的开始,然后将其权重减小到先前的权重。达到先前的重量后,我移至第二个最大重量增加的项目。我一直这样直到重量总和达到1,或者直到我用完了自上次以来重量增加的所有物品为止。

我使用了下面的方法,该方法应该有效,但是要花很多时间。有更快的方法吗?

overage = wts.sum(1) - 1  # this is the amount that needs to be subtracted at each point
dwts = wts.fillna(0).diff()  # change in wts from last period
dwts_rank = dwts.rank(axis=1, method='min', ascending=False)
dwts_rank_up = dwts_rank[dwts > 0]  # only want to look at things that increased in weight

for index, row in dwts.iloc[1:].iterrows():
    curr_ranks = dwts_rank_up.loc[index]
    rank = 1
    curr_overage = overage[index]
    while curr_overage > 0:
        try:
            possible_reduction = row[curr_ranks == rank].values[0]
            row[curr_ranks == rank] = max(0, possible_reduction - curr_overage)
            curr_overage = curr_overage - min(possible_reduction, curr_overage)
            rank = rank + 1
        except: # not enough items with increase in weight to offset overage
            continue
wts= wts.iloc[0] + dwts.cumsum()

以下是wts,dwts,dwts_rank_up和超量的样子: enter image description here enter image description here enter image description here enter image description here

0 个答案:

没有答案