按列值减去数据帧行

时间:2021-04-22 09:09:57

标签: python pandas dataframe analytics

我有分析结果表

         Name  Analysis  Result                 Date    Type
0      Doe J.  Albumine    10.6   23.02.2021 8:07:22  before
1      Doe J.  Albumine     6.5   25.03.2021 8:08:09   after
2     Pine C.  Albumine    13.3   25.03.2021 9:17:54  before
3     Pine C.  Albumine    11.0   22.02.2021 9:25:54   after
4  Jackson D.  Albumine    14.2  23.02.2021 10:51:38  before
5   Jackson D  Albumine    12.2           23.03.2021   after
6  Schafer L.  Albumine     8.4  25.02.2021 10:39:39  before
7  Schafer L.  Albumine     9.3  25.03.2021 12:06:15   after

我的目标是根据“类型”列计算每个患者的两次分析(这些都是虚构的)之间的差异,并获得下表:

     Name  Before  After  Difference
0  Doe j.    10.6    6.5         3.9

我尝试过 groupby 但没有成功。将不胜感激任何帮助。

1 个答案:

答案 0 :(得分:2)

DataFrame.pivot 与减法一起使用:

df = df.pivot('Name','Type','Result').reset_index().rename_axis(columns=None)
df['diff'] = df['before'].sub(df['after'])

print (df)
         Name  after  before  diff
0      Doe J.    6.5    10.6   4.1
1  Jackson D.   12.2    14.2   2.0
2     Pine C.   11.0    13.3   2.3
3  Schafer L.    9.3     8.4  -0.9

如果错误:

<块引用>

ValueError: 索引包含重复条目,无法重塑

这意味着有重复,这意味着对于相同的 Name, Type 是 2 个或多个值,例如:

print (df)
         Name  Analysis  Result                 Date    Type
0      Doe J.  Albumine    10.6   23.02.2021 8:07:22  before <- duplicate Doe J., before
0      Doe J.  Albumine    10.6   23.02.2021 8:07:22  before <- duplicate Doe J., before
1      Doe J.  Albumine     6.5   25.03.2021 8:08:09   after
2     Pine C.  Albumine    13.3   25.03.2021 9:17:54  before
3     Pine C.  Albumine    11.0   22.02.2021 9:25:54   after
4  Jackson D.  Albumine    14.2  23.02.2021 10:51:38  before
5  Jackson D.  Albumine    12.2           23.03.2021   after
6  Schafer L.  Albumine     8.4  25.02.2021 10:39:39  before
7  Schafer L.  Albumine     9.3  25.03.2021 12:06:15   after

使用 DataFrame.pivot_table 和一些聚合函数(如 meansum)的可能解决方案。如果需要第一个匹配的值使用 aggfunc='first'

df = df.pivot_table(index='Name',columns='Type',values='Result', aggfunc='sum').reset_index().rename_axis(columns=None)
df['diff'] = df['before'].sub(df['after'])

print (df)
         Name  after  before  diff
0      Doe J.    6.5    21.2  14.7 <- 21.2 because sum
1  Jackson D.   12.2    14.2   2.0
2     Pine C.   11.0    13.3   2.3
3  Schafer L.    9.3     8.4  -0.9
相关问题