比例/百分比值

时间:2019-07-01 07:38:41

标签: python python-3.x pandas-groupby

我有这个数据框:

o   d   r   kz  p
1   3   1   5   NaN
1   3   2   0   NaN
1   10  1   7   NaN
1   10  3   1   NaN
1   10  2   2   NaN

我想用每对“ o”和“ d”对的“ kz”值比例来填充“ p”列。结果应类似于:

o   d   r   kz  p
1   3   1   5   100%
1   3   2   0   0%
1   10  1   7   70%
1   10  3   1   10%
1   10  2   2   20%

我正在考虑遍历数据框并分配kz值列表的列表,然后递归填充p列。

是否有任何优雅的方式来做到这一点,例如与groupbyPivot表?

2 个答案:

答案 0 :(得分:1)

您可以按照以下几个步骤进行操作:

  • 使用groupby (doc)agg (doc)计算每组的总和。
  • 使用merge (doc)将这些值与当前数据框合并。
  • 计算比率

代码在这里:

# Import modules
import pandas as pd
import numpy as np

# Data
df = pd.DataFrame(
    [[1,   3,  1,   5,   np.NaN],
     [1,  3,  2,   0,   np.NaN],
     [1,  10,  1,   7,   np.NaN],
     [1,  10,  3,  1,   np.NaN],
     [1,  10,  2,   2,   np.NaN]],
    columns=["o", "d", "r", "kz", "p"])
print(df)
#    o   d  r  kz   p
# 0  1   3  1   5 NaN
# 1  1   3  2   0 NaN
# 2  1  10  1   7 NaN
# 3  1  10  3   1 NaN
# 4  1  10  2   2 NaN

# Compute the sum per group
sum_ = df.groupby(['o', 'd']).agg({'kz': 'sum'})
sum_.reset_index(inplace=True)
print(sum_)
#    o   d  kz
# 0  1   3   5
# 1  1  10  10

# Merge these values with the current dataframe
df = df.merge(sum_, on=['o', 'd'], how="outer", suffixes=('', '_sum'))
print(df)
#    o   d  r  kz   p  kz_sum
# 0  1   3  1   5 NaN       5
# 1  1   3  2   0 NaN       5
# 2  1  10  1   7 NaN      10
# 3  1  10  3   1 NaN      10
# 4  1  10  2   2 NaN      10

# Compute teh ratio
df.p = df.kz / df.kz_sum * 100
print(df)
#    o   d  r  kz      p  kz_sum
# 0  1   3  1   5  100.0       5
# 1  1   3  2   0    0.0       5
# 2  1  10  1   7   70.0      10
# 3  1  10  3   1   10.0      10
# 4  1  10  2   2   20.0      10

答案 1 :(得分:1)

第一个 sum()'kz'列按'o'和'd'分组,并将其存储在'tmp'中。合并这两个数据框。然后使用原始值“ kz”和总和值“ kz”计算百分比值“ p”。删除总和值“ kz”,然后将原始列名重命名为“ kz”。

import pandas as pd
d = {'o' : pd.Series([1,1,1,1,1]),
      'd' : pd.Series([3,3,10,10,10]),
      'r' : pd.Series([1,2,1,3,2]),
      'kz' : pd.Series([5,0,7,1,2]),
      'p' : pd.Series(None)}

# creates Dataframe.
df = pd.DataFrame(d)

tmp=df.groupby(['o','d'])["kz"].sum()
merge_tmp=pd.merge(df, tmp, on=['o','d'], how='inner',suffixes=('_org','_tmp'))
merge_tmp['p'] = ((merge_tmp['kz_org']/merge_tmp['kz_tmp'])*100)

merge_tmp = merge_tmp.drop('kz_tmp', axis='columns')
merge_tmp = merge_tmp.rename({'kz_org': 'kz'}, axis='columns')
print(merge_tmp)