在pandas中分组和减去列

时间:2017-09-26 06:51:13

标签: python pandas dataframe group-by pandas-groupby

我有一个包含4列的时间序列数据,我希望通过列FisherIDDateFishingTotal_Catch进行分组,并对列Weight求和。另外,我希望将列Total_catch中的值减去列重量中的值,其结果将保留在名为DIFF的新列中。并且,我想在列DIFF中显示高于0.1的值。

这是我的代码。

df["DIFF"]=df.groupby(["FisherID", "DateFishing", "Total_Catch"]) ["Weight"].sum()-["Total_Catch"]>=0.1

我的数据:

FisherID    DateFishing Total_Catch Weight
1            24-Oct-11      0.9      0.2
1            24-Oct-11      0.9      0.264
1            24-Oct-11      0.9      0.37
2            25-Oct-11      0.7      0.144
2            27-Oct-11      8.2      0.084
2            27-Oct-11      8.2      0.45
3            27-Oct-11      8.2      0.61
3            27-Oct-11      8.2      7
3            29-Oct-11      0.64    0.184

1 个答案:

答案 0 :(得分:5)

我认为您正在寻找 groupby + transform

df['Sum'] = df.groupby(
    ["FisherID", "DateFishing", "Total_Catch"]
)["Weight"].transform('sum')

然后,从Diff中减去Weight col,找到Total_Catch

df['Diff'] = (df['Total_Catch'] - df['Weight'])

df

   FisherID DateFishing  Total_Catch  Weight    Sum   Diff
0         1   24-Oct-11         0.90   0.200  0.834  0.700
1         1   24-Oct-11         0.90   0.264  0.834  0.636
2         1   24-Oct-11         0.90   0.370  0.834  0.530
3         2   25-Oct-11         0.70   0.144  0.144  0.556
4         2   27-Oct-11         8.20   0.084  0.534  8.116
5         2   27-Oct-11         8.20   0.450  0.534  7.750
6         3   27-Oct-11         8.20   0.610  7.610  7.590
7         3   27-Oct-11         8.20   7.000  7.610  1.200
8         3   29-Oct-11         0.64   0.184  0.184  0.456

或者,如果您尝试从Weight中减去分组的Total_Catch,请使用:

df['Diff'] = df["Total_Catch"] -df.groupby(["FisherID", \
                   "DateFishing", "Total_Catch"])["Weight"].transform('sum')

df

   FisherID DateFishing  Total_Catch  Weight   Diff
0         1   24-Oct-11         0.90   0.200  0.066
1         1   24-Oct-11         0.90   0.264  0.066
2         1   24-Oct-11         0.90   0.370  0.066
3         2   25-Oct-11         0.70   0.144  0.556
4         2   27-Oct-11         8.20   0.084  7.666
5         2   27-Oct-11         8.20   0.450  7.666
6         3   27-Oct-11         8.20   0.610  0.590
7         3   27-Oct-11         8.20   7.000  0.590
8         3   29-Oct-11         0.64   0.184  0.456

查询行

本节以第二个选项的结果为基础。请注意,所有这些选项都将布尔掩码应用于数据帧。如果你想要的只是掩码,不要将它应用于数据帧。只需应用条件并打印:

df.Diff > 0.1

0    False
1    False
2    False
3     True
4     True
5     True
6     True
7     True
8     True
Name: Diff, dtype: bool

如果要提取所有有效行,可以选择几个选项。

df.query

df.query('Diff > 0.1')

   FisherID DateFishing  Total_Catch  Weight   Diff
3         2   25-Oct-11         0.70   0.144  0.556
4         2   27-Oct-11         8.20   0.084  7.666
5         2   27-Oct-11         8.20   0.450  7.666
6         3   27-Oct-11         8.20   0.610  0.590
7         3   27-Oct-11         8.20   7.000  0.590
8         3   29-Oct-11         0.64   0.184  0.456

boolean indexing

df[df.Diff > 0.1]

   FisherID DateFishing  Total_Catch  Weight   Diff
3         2   25-Oct-11         0.70   0.144  0.556
4         2   27-Oct-11         8.20   0.084  7.666
5         2   27-Oct-11         8.20   0.450  7.666
6         3   27-Oct-11         8.20   0.610  0.590
7         3   27-Oct-11         8.20   7.000  0.590
8         3   29-Oct-11         0.64   0.184  0.456

df.eval

df[df.eval('Diff > 0.1')]

   FisherID DateFishing  Total_Catch  Weight   Diff
3         2   25-Oct-11         0.70   0.144  0.556
4         2   27-Oct-11         8.20   0.084  7.666
5         2   27-Oct-11         8.20   0.450  7.666
6         3   27-Oct-11         8.20   0.610  0.590
7         3   27-Oct-11         8.20   7.000  0.590
8         3   29-Oct-11         0.64   0.184  0.456

df.where dropna

df.where(df.Diff > 0.1).dropna(how='all')

   FisherID DateFishing  Total_Catch  Weight   Diff
3       2.0   25-Oct-11         0.70   0.144  0.556
4       2.0   27-Oct-11         8.20   0.084  7.666
5       2.0   27-Oct-11         8.20   0.450  7.666
6       3.0   27-Oct-11         8.20   0.610  0.590
7       3.0   27-Oct-11         8.20   7.000  0.590
8       3.0   29-Oct-11         0.64   0.184  0.456

np.where df.iloc

df.iloc[np.where(df.Diff > 0.1)[0]]

   FisherID DateFishing  Total_Catch  Weight   Diff
3         2   25-Oct-11         0.70   0.144  0.556
4         2   27-Oct-11         8.20   0.084  7.666
5         2   27-Oct-11         8.20   0.450  7.666
6         3   27-Oct-11         8.20   0.610  0.590
7         3   27-Oct-11         8.20   7.000  0.590
8         3   29-Oct-11         0.64   0.184  0.456

请注意,这些结果具有原始df的索引。如果要重置索引,请使用 reset_index

df[df.Diff > 0.1].reset_index(drop=True)

   FisherID DateFishing  Total_Catch  Weight   Diff
0         2   25-Oct-11         0.70   0.144  0.556
1         2   27-Oct-11         8.20   0.084  7.666
2         2   27-Oct-11         8.20   0.450  7.666
3         3   27-Oct-11         8.20   0.610  0.590
4         3   27-Oct-11         8.20   7.000  0.590
5         3   29-Oct-11         0.64   0.184  0.456