我想在patient
之后添加特定列groupby('score')
的第25个百分位数信息,但出现错误,如下所示。
import pandas as pd
raw_data = {'patient': [242, 151, 111,122, 342],
'obs': [1, 2, 3, 1, 2],
'treatment': [0, 1, 0, 1, 0],
'score': ['strong', 'weak', 'weak', 'weak', 'strong']}
df = pd.DataFrame(raw_data, columns = ['patient', 'obs', 'treatment', 'score'])
df
patient obs treatment score
0 242 1 0 strong
1 151 2 1 weak
2 111 3 0 weak
3 122 1 1 weak
4 342 2 0 strong
quantile_25 = []
df_g=df.groupby("score")
for col in df.keys():
if col=='patient':
Q1 = df_g.apply(lambda _df: _df.np.percentile(_df[feature], q = 25))
quantile_25.append(Q1)
else:
pass
df['std_dev_patient'] = df.score.map(quantile_25[0])
AttributeError:无法访问>'DataFrameGroupBy'对象的可调用属性'groupby',请尝试使用'apply'方法
我想保持相同的for loop
,就像我想将其他统计信息添加为新列一样。
thx
预期输出
patient obs treatment score quantile_25
0 242 1 0 strong ..
1 151 2 1 weak ..
2 111 3 0 weak ..
3 122 1 1 weak ..
4 342 2 0 strong ..
答案 0 :(得分:1)
这是不使用Apply的解决方案:
df_g=df.groupby("score")
for col in df.columns:
if col=='patient':
df['std_dev_patient'] = df_g[col].transform(lambda group: np.percentile(group, q=25))
else:
pass
输出:
patient obs treatment score std_dev_patient
0 242 1 0 strong 267.0
1 151 2 1 weak 116.5
2 111 3 0 weak 116.5
3 122 1 1 weak 116.5
4 342 2 0 strong 267.0