我有这个数据框:
o d r kz p
1 3 1 5 NaN
1 3 2 0 NaN
1 10 1 7 NaN
1 10 3 1 NaN
1 10 2 2 NaN
我想用每对“ o”和“ d”对的“ kz”值比例来填充“ p”列。结果应类似于:
o d r kz p
1 3 1 5 100%
1 3 2 0 0%
1 10 1 7 70%
1 10 3 1 10%
1 10 2 2 20%
我正在考虑遍历数据框并分配kz
值列表的列表,然后递归填充p
列。
是否有任何优雅的方式来做到这一点,例如与groupby
或Pivot
表?
答案 0 :(得分:1)
您可以按照以下几个步骤进行操作:
代码在这里:
# Import modules
import pandas as pd
import numpy as np
# Data
df = pd.DataFrame(
[[1, 3, 1, 5, np.NaN],
[1, 3, 2, 0, np.NaN],
[1, 10, 1, 7, np.NaN],
[1, 10, 3, 1, np.NaN],
[1, 10, 2, 2, np.NaN]],
columns=["o", "d", "r", "kz", "p"])
print(df)
# o d r kz p
# 0 1 3 1 5 NaN
# 1 1 3 2 0 NaN
# 2 1 10 1 7 NaN
# 3 1 10 3 1 NaN
# 4 1 10 2 2 NaN
# Compute the sum per group
sum_ = df.groupby(['o', 'd']).agg({'kz': 'sum'})
sum_.reset_index(inplace=True)
print(sum_)
# o d kz
# 0 1 3 5
# 1 1 10 10
# Merge these values with the current dataframe
df = df.merge(sum_, on=['o', 'd'], how="outer", suffixes=('', '_sum'))
print(df)
# o d r kz p kz_sum
# 0 1 3 1 5 NaN 5
# 1 1 3 2 0 NaN 5
# 2 1 10 1 7 NaN 10
# 3 1 10 3 1 NaN 10
# 4 1 10 2 2 NaN 10
# Compute teh ratio
df.p = df.kz / df.kz_sum * 100
print(df)
# o d r kz p kz_sum
# 0 1 3 1 5 100.0 5
# 1 1 3 2 0 0.0 5
# 2 1 10 1 7 70.0 10
# 3 1 10 3 1 10.0 10
# 4 1 10 2 2 20.0 10
答案 1 :(得分:1)
第一个 sum()'kz'列按'o'和'd'分组,并将其存储在'tmp'中。合并这两个数据框。然后使用原始值“ kz”和总和值“ kz”计算百分比值“ p”。删除总和值“ kz”,然后将原始列名重命名为“ kz”。
import pandas as pd
d = {'o' : pd.Series([1,1,1,1,1]),
'd' : pd.Series([3,3,10,10,10]),
'r' : pd.Series([1,2,1,3,2]),
'kz' : pd.Series([5,0,7,1,2]),
'p' : pd.Series(None)}
# creates Dataframe.
df = pd.DataFrame(d)
tmp=df.groupby(['o','d'])["kz"].sum()
merge_tmp=pd.merge(df, tmp, on=['o','d'], how='inner',suffixes=('_org','_tmp'))
merge_tmp['p'] = ((merge_tmp['kz_org']/merge_tmp['kz_tmp'])*100)
merge_tmp = merge_tmp.drop('kz_tmp', axis='columns')
merge_tmp = merge_tmp.rename({'kz_org': 'kz'}, axis='columns')
print(merge_tmp)