基于列值的熊猫分组

时间:2020-08-22 12:35:58

标签: python pandas

我有以下数据框-dfgeo

              x            y         z  zt  n  k  pv                         geometry        dist
0   6574878.210  4757530.610  1152.588   1  8  4  90  POINT (6574878.210 4757530.610)    0.000000
1   6574919.993  4757570.314  1174.724   0            POINT (6574919.993 4757570.314)   57.638760
2   6575020.518  4757665.839  1177.339   0            POINT (6575020.518 4757665.839)  138.673362
3   6575239.548  4757873.972  1160.156   1  8  4  90  POINT (6575239.548 4757873.972)  302.148120
4   6575351.603  4757980.452  1202.418   0            POINT (6575351.603 4757980.452)  154.577856
5   6575442.780  4758067.093  1199.297   0            POINT (6575442.780 4758067.093)  125.777217
6   6575538.217  4758157.782  1192.914   1  8  4  90  POINT (6575538.217 4758157.782)  131.653772
7   6575594.625  4758240.033  1217.442   0            POINT (6575594.625 4758240.033)   99.735096
8   6575738.820  4758450.289  1174.477   0            POINT (6575738.820 4758450.289)  254.950551
9   6575850.937  4758613.772  1123.852   1  8  4  90  POINT (6575850.937 4758613.772)  198.234490
10  6575984.323  4758647.118  1131.761   0            POINT (6575984.323 4758647.118)  137.491020
11  6576204.312  4758702.115  1119.407   0            POINT (6576204.312 4758702.115)  226.759410
12  6576303.976  4758727.031  1103.064   0            POINT (6576303.976 4758727.031)  102.731300
13  6576591.496  4758798.910   1114.06   0            POINT (6576591.496 4758798.910)  296.368590
14  6576736.965  4758835.277  1120.285   1  8  4  90  POINT (6576736.965 4758835.277)  149.945952

我正在尝试按zt值分组汇总dist列。我已经尝试过了:

def summarize(group):
    s = group['zt'].eq(1).cumsum()
    return group.groupby(s).agg(
        D=('dist', 'sum')
    )
dfzp=dfgeo.apply(summarize)

但是我在代码的最后一行出现以下错误

    s = group['zt'].eq(1).cumsum()
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\series.py", line 871, in __getitem__
    result = self.index.get_value(self, key)
  File "C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\base.py", line 4405, in get_value
    return self._engine.get_value(s, k, tz=getattr(series.dtype, "tz", None))
  File "pandas\_libs\index.pyx", line 80, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 90, in pandas._libs.index.IndexEngine.get_value
  File "pandas\_libs\index.pyx", line 135, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index_class_helper.pxi", line 109, in pandas._libs.index.Int64Engine._check_type
KeyError: 'zt'

解决此问题的任何帮助。

1 个答案:

答案 0 :(得分:2)

如果需要通过Dataframe进行功能使用:

dfzp=summarize(dfgeo)

DataFrame.pipe

dfzp=dfgeo.pipe(summarize)

如果使用DataFrame.apply,则如果使用axis=1,则按列或按行使用功能。