列值之间的熊猫总和

时间:2020-08-22 16:18:43

标签: python pandas dataframe

我有以下数据框dfgeo

              x            y         z  zt  n  k  pv                span                         geometry
0   6574878.210  4757530.610  1152.588   1  8  4  90   57.63876043929083  POINT (6574878.210 4757530.610)
1   6574919.993  4757570.314  1174.724   0             138.6733617172676  POINT (6574919.993 4757570.314)
2   6575020.518  4757665.839  1177.339   0            302.14812028088545  POINT (6575020.518 4757665.839)
3   6575239.548  4757873.972  1160.156   1  8  4  90   154.5778555448033  POINT (6575239.548 4757873.972)
4   6575351.603  4757980.452  1202.418   0            125.77721657819234  POINT (6575351.603 4757980.452)
5   6575442.780  4758067.093  1199.297   0            131.65377203050443  POINT (6575442.780 4758067.093)
6   6575538.217  4758157.782  1192.914   1  8  4  90   99.73509645559476  POINT (6575538.217 4758157.782)
7   6575594.625  4758240.033  1217.442   0            254.95055120769572  POINT (6575594.625 4758240.033)
8   6575738.820  4758450.289  1174.477   0            198.23448987983204  POINT (6575738.820 4758450.289)

我想对span之间的zt==1列的值求和:

def summarize(group):
    s = group['zt'].eq(1).cumsum()
    return group.groupby(s).agg(
        D=('span', 'sum')
    )
dfzp=summarize(dfgeo)
print(dfzp)

打印输出:

zt
1   57.63876043929083138.6733617172676302.14812028...
2   154.5778555448033125.77721657819234131.6537720...
3   99.73509645559476254.95055120769572198.2344898...
4   137.49102047762113226.75941023488875102.731299...
5                  223.552487532538871.61932167407961
6   217.28304840632796141.34049561326185237.708809...

示例所需的输出是zt之间值为1的子数据帧的总和

zt
1 498.44
2 412.007
3 (sum between zt==1 )
...

2 个答案:

答案 0 :(得分:1)

首先使用pd.to_numeric将列span的dtype转换为数字类型,然后在列span上使用Series.groupby并使用sum进行聚合:

df['span'] = pd.to_numeric(df['span'], errors='coerce')
s = df['span'].groupby(df['zt'].eq(1).cumsum()).sum()

结果:

print(s)
zt
1    498.460242
2    412.008844
3    552.920138
Name: span, dtype: float64

编辑(对于multiple列):

cols = ['x', 'y', 'span']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
s = df[cols].groupby(df['zt'].eq(1).cumsum()).sum()

结果:

               x             y        span
zt                                        
1   1.972482e+07  1.427277e+07   57.638760
2   1.972603e+07  1.427392e+07  412.008844
3   1.972687e+07  1.427485e+07  552.920138

答案 1 :(得分:0)

如果所需结果是'dfgeo'子集上'span'的总和,条件是zt == 1,我会尝试:

a = dfgeo[dfgeo['zt']==1]
x = a['span'].sum()