我有以下数据框dfgeo
:
x y z zt n k pv span geometry
0 6574878.210 4757530.610 1152.588 1 8 4 90 57.63876043929083 POINT (6574878.210 4757530.610)
1 6574919.993 4757570.314 1174.724 0 138.6733617172676 POINT (6574919.993 4757570.314)
2 6575020.518 4757665.839 1177.339 0 302.14812028088545 POINT (6575020.518 4757665.839)
3 6575239.548 4757873.972 1160.156 1 8 4 90 154.5778555448033 POINT (6575239.548 4757873.972)
4 6575351.603 4757980.452 1202.418 0 125.77721657819234 POINT (6575351.603 4757980.452)
5 6575442.780 4758067.093 1199.297 0 131.65377203050443 POINT (6575442.780 4758067.093)
6 6575538.217 4758157.782 1192.914 1 8 4 90 99.73509645559476 POINT (6575538.217 4758157.782)
7 6575594.625 4758240.033 1217.442 0 254.95055120769572 POINT (6575594.625 4758240.033)
8 6575738.820 4758450.289 1174.477 0 198.23448987983204 POINT (6575738.820 4758450.289)
我想对span
之间的zt==1
列的值求和:
def summarize(group):
s = group['zt'].eq(1).cumsum()
return group.groupby(s).agg(
D=('span', 'sum')
)
dfzp=summarize(dfgeo)
print(dfzp)
打印输出:
zt
1 57.63876043929083138.6733617172676302.14812028...
2 154.5778555448033125.77721657819234131.6537720...
3 99.73509645559476254.95055120769572198.2344898...
4 137.49102047762113226.75941023488875102.731299...
5 223.552487532538871.61932167407961
6 217.28304840632796141.34049561326185237.708809...
示例所需的输出是zt之间值为1的子数据帧的总和
zt
1 498.44
2 412.007
3 (sum between zt==1 )
...
答案 0 :(得分:1)
首先使用pd.to_numeric
将列span
的dtype转换为数字类型,然后在列span
上使用Series.groupby
并使用sum
进行聚合:>
df['span'] = pd.to_numeric(df['span'], errors='coerce')
s = df['span'].groupby(df['zt'].eq(1).cumsum()).sum()
结果:
print(s)
zt
1 498.460242
2 412.008844
3 552.920138
Name: span, dtype: float64
编辑(对于multiple
列):
cols = ['x', 'y', 'span']
df[cols] = df[cols].apply(pd.to_numeric, errors='coerce')
s = df[cols].groupby(df['zt'].eq(1).cumsum()).sum()
结果:
x y span
zt
1 1.972482e+07 1.427277e+07 57.638760
2 1.972603e+07 1.427392e+07 412.008844
3 1.972687e+07 1.427485e+07 552.920138
答案 1 :(得分:0)
如果所需结果是'dfgeo'子集上'span'的总和,条件是zt == 1,我会尝试:
a = dfgeo[dfgeo['zt']==1]
x = a['span'].sum()