我有分析结果表
Name Analysis Result Date Type
0 Doe J. Albumine 10.6 23.02.2021 8:07:22 before
1 Doe J. Albumine 6.5 25.03.2021 8:08:09 after
2 Pine C. Albumine 13.3 25.03.2021 9:17:54 before
3 Pine C. Albumine 11.0 22.02.2021 9:25:54 after
4 Jackson D. Albumine 14.2 23.02.2021 10:51:38 before
5 Jackson D Albumine 12.2 23.03.2021 after
6 Schafer L. Albumine 8.4 25.02.2021 10:39:39 before
7 Schafer L. Albumine 9.3 25.03.2021 12:06:15 after
我的目标是根据“类型”列计算每个患者的两次分析(这些都是虚构的)之间的差异,并获得下表:
Name Before After Difference
0 Doe j. 10.6 6.5 3.9
我尝试过 groupby 但没有成功。将不胜感激任何帮助。
答案 0 :(得分:2)
将 DataFrame.pivot
与减法一起使用:
df = df.pivot('Name','Type','Result').reset_index().rename_axis(columns=None)
df['diff'] = df['before'].sub(df['after'])
print (df)
Name after before diff
0 Doe J. 6.5 10.6 4.1
1 Jackson D. 12.2 14.2 2.0
2 Pine C. 11.0 13.3 2.3
3 Schafer L. 9.3 8.4 -0.9
如果错误:
<块引用>ValueError: 索引包含重复条目,无法重塑
这意味着有重复,这意味着对于相同的 Name, Type
是 2 个或多个值,例如:
print (df)
Name Analysis Result Date Type
0 Doe J. Albumine 10.6 23.02.2021 8:07:22 before <- duplicate Doe J., before
0 Doe J. Albumine 10.6 23.02.2021 8:07:22 before <- duplicate Doe J., before
1 Doe J. Albumine 6.5 25.03.2021 8:08:09 after
2 Pine C. Albumine 13.3 25.03.2021 9:17:54 before
3 Pine C. Albumine 11.0 22.02.2021 9:25:54 after
4 Jackson D. Albumine 14.2 23.02.2021 10:51:38 before
5 Jackson D. Albumine 12.2 23.03.2021 after
6 Schafer L. Albumine 8.4 25.02.2021 10:39:39 before
7 Schafer L. Albumine 9.3 25.03.2021 12:06:15 after
使用 DataFrame.pivot_table
和一些聚合函数(如 mean
、sum
)的可能解决方案。如果需要第一个匹配的值使用 aggfunc='first'
df = df.pivot_table(index='Name',columns='Type',values='Result', aggfunc='sum').reset_index().rename_axis(columns=None)
df['diff'] = df['before'].sub(df['after'])
print (df)
Name after before diff
0 Doe J. 6.5 21.2 14.7 <- 21.2 because sum
1 Jackson D. 12.2 14.2 2.0
2 Pine C. 11.0 13.3 2.3
3 Schafer L. 9.3 8.4 -0.9