在Python中按几个间隔平均列值

时间:2019-03-28 22:10:00

标签: python python-3.x pandas dataframe

我有一个包含深度和其他值列的数据框:

data = {'Depth': [1.0, 1.0, 1.5, 2.0, 2.5, 2.5, 3.0, 3.5, 4.0, 4.0, 5.0, 5.5, 6.0], 
'Value1':[44, 46, 221, 12, 47, 44, 67, 90, 100, 111, 112, 120, 122], 
'Value2': [55, 65, 76, 45, 55, 58, 23, 12, 32, 20, 22, 26, 36]}

df = pd.DataFrame(data)

您有时会看到Depth中有重复。

我希望能够以某种方式对间隔进行分组并在它们之间求平均值。 例如,我想要的输出将是:

intervals = [1.0, 2.0]

获取时间间隔列表并将这些时间间隔的数据集分解为平均每个值(Value1,Value2),以获得:

    Depth  Value1  Value2   Avg1_1  Avg2_1  Avg1_2   Avg2_2   
0     1.0      44      55   80.75   60.25   78.2     .
1     1.0      46      65   80.75   60.25   78.2     .
2     1.5     221      76   80.75   60.25   78.2     .
3     2.0      12      45   80.75   60.25   78.2
4     2.5      47      55   52.67   .       78.2
5     2.5      44      58   52.67   .       78.2
6     3.0      67      23   52.67   .       78.2 
7     3.5      90      12   100.33          78.2
8     4.0     100      32   100.33          78.2
9     4.0     111      20   100.33          78.2
10    5.0     112      22   112             .
11    5.5     120      26   121             .
12    6.0     122      36   121             .

其中Avg1_是Value1的每个间隔内1.0的平均值(包括(1.0-2.0、2.5-3.0等)。

是否有简单的方法可以在循环中使用groupby

1 个答案:

答案 0 :(得分:0)

您可以使用数据框的apply方法来完成此操作,然后通过布尔值对满足depth + 1.0depth + 2.0之类的条件的行(及相关值)进行采样。

df['avg1_1'] = df.apply(lambda x: (df[df['Depth'] <= x['Depth'] + 1.0]['Value1'].values.sum() / 
                                          len(df[df['Depth'] <= x['Depth'] + 1.0]['Value1'].values)),
                                            axis=1)

df['avg2_1'] = df.apply(lambda x: (df[df['Depth'] <= x['Depth'] + 1.0]['Value2'].values.sum() / 
                                  len(df[df['Depth'] <= x['Depth'] + 1.0]['Value2'].values)),
                                    axis=1)

df['avg1_2'] = df.apply(lambda x: (df[df['Depth'] <= x['Depth'] + 2.0]['Value1'].values.sum() / 
                                  len(df[df['Depth'] <= x['Depth'] + 2.0]['Value1'].values)),
                                    axis=1)

df['avg2_2'] = df.apply(lambda x: (df[df['Depth'] <= x['Depth'] + 2.0]['Value2'].values.sum() / 
                                  len(df[df['Depth'] <= x['Depth'] + 2.0]['Value2'].values)),
                                    axis=1)

这将返回:

Depth   Value1  Value2  newval  avg1_1  avg2_1  avg1_2  avg2_2
0   1.0 44  55  66.0    80.750000   60.250000   68.714286   53.857143
1   1.0 46  65  241.0   80.750000   60.250000   68.714286   53.857143
2   1.5 221 76  32.0    69.000000   59.000000   71.375000   48.625000
3   2.0 12  45  67.0    68.714286   53.857143   78.200000   44.100000
4   2.5 47  55  64.0    71.375000   48.625000   78.200000   44.100000
5   2.5 44  58  87.0    71.375000   48.625000   78.200000   44.100000
6   3.0 67  23  110.0   78.200000   44.100000   81.272727   42.090909
7   3.5 90  12  120.0   78.200000   44.100000   84.500000   40.750000
8   4.0 100 32  131.0   81.272727   42.090909   87.384615   40.384615
9   4.0 111 20  132.0   81.272727   42.090909   87.384615   40.384615
10  5.0 112 22  140.0   87.384615   40.384615   87.384615   40.384615
11  5.5 120 26  142.0   87.384615   40.384615   87.384615   40.384615
12  6.0 122 36  NaN 87.384615   40.384615   87.384615   40.384615