如何围绕Pandas百分比?

时间:2017-11-20 15:26:54

标签: python pandas

我必须打印百分比,但诀窍是我必须将值四舍五入到4位小数。 它位于DataFrame中,其中每列代表一次分配的百分比。

有时,百分比的总和不给1,而是0.9999或1.0001(这是有道理的)。但你怎么确定它呢? 你必须任意选择一行并将delta放入其中。 我已经提出了这个解决方案,但是我必须遍历每一列并对系列进行修改。

代码

df = abs(pd.DataFrame(np.random.randn(4, 4), columns=range(0,4)))
# Making sure the sum of allocation is 1.
df = df / df.sum()
# Rounding the allocation
df = df.round(4)
print("-- before --")
print(df)
print(df.sum())

# It can happen that after rounding your number, the sum is not equal to 1. (imagine rounding 1/3 three times...)
# So check for the sum of each col and then put the delta in in the fund with the lowest value.
for p in df:
    if df[p].sum() != 1:
        # get the id of the fund with the lowest percentage (but not 0)
        low_id = (df[p][df[p] != 0].idxmin())
        df[p][low_id] += (1 - df[p].sum())
print("-- after --")
print(df)
print(df.sum())

输出

-- before --
        0       1       2       3
0  0.0116  0.1256  0.4980  0.3738
1  0.2562  0.5458  0.3086  0.1221
2  0.4853  0.0009  0.0588  0.0078
3  0.2470  0.3277  0.1346  0.4962
0    1.0001
1    1.0000
2    1.0000
3    0.9999
dtype: float64
-- after --
        0       1       2       3
0  0.0115  0.1256  0.4980  0.3738
1  0.2562  0.5458  0.3086  0.1221
2  0.4853  0.0009  0.0588  0.0079
3  0.2470  0.3277  0.1346  0.4962
0    1.0
1    1.0
2    1.0
3    1.0
dtype: float64

有没有更快的解决方案?

非常感谢,

此致 于连

1 个答案:

答案 0 :(得分:0)

避免循环总是更好。

df = abs(pd.DataFrame(np.random.randn(4, 4) ))

df = df / df.sum()
df = df.round(4)

columns = ['Sum','Min', 'submin']
dftemp = pd.DataFrame(columns=columns)
dftemp['Sum']= df.sum(axis=0)     # sum columns
dftemp['Min']= df[df!=0].min(axis=0)   # non zero minimum of column
dftemp['submin']= dftemp['Min']+(1-dftemp['Sum'])  # (1 -sum of columns) + minimum value
dftemp['FinalValue']= np.where (dftemp['Sum']!=1,dftemp.submin,dftemp.Min)  # decide weather to use existing miinimum value or delta

print('\n\nBefore \n\n ',df,'\n\n ', df.sum())

df=df.mask(df.eq(df.min(0),1),df.eq(df.min(0),1).mul(dftemp['FinalValue'].tolist())) # Replace the minmum value with delta values

print('After \n\n ',df,'\n\n ', df.sum())

输出

输出

Before 

          0       1       2       3
0  0.1686  0.0029  0.1055  0.1739
1  0.5721  0.5576  0.2904  0.2205
2  0.0715  0.2749  0.4404  0.5014
3  0.1878  0.1647  0.1637  0.1042 

  0    1.0000
1    1.0001
2    1.0000
3    1.0000
dtype: float64
After 

          0       1       2       3
0  0.1686  0.0028  0.1055  0.1739
1  0.5721  0.5576  0.2904  0.2205
2  0.0715  0.2749  0.4404  0.5014
3  0.1878  0.1647  0.1637  0.1042 

  0    1.0
1    1.0
2    1.0
3    1.0
dtype: float64