如何在Pandas groupby()上使用外部函数?

时间:2019-02-27 10:55:18

标签: python pandas dataframe pandas-groupby data-science

背景

因此,我有一个基于整体的模型,我想对从N个检测器获得的标签进行后处理。为此,我想使用groupby功能按照我构建的IOU函数对每个帧中的拘留进行分组。这将有助于我为每组检测做出决定,并避免在帧的同一区域进行多次检测。

问::groupby函数的正确返回值是多少?

熊猫groupby文档说:

by : mapping, function, label, or list of labels
            Used to determine the groups for the groupby.
            If ``by`` is a function, it's called on each value of the object's
            index. If a dict or Series is passed, the Series or dict VALUES
            will be used to determine the groups (the Series' values are first
            aligned; see ``.align()`` method). If an ndarray is passed, the
            values are used as-is determine the groups. A label or list of
            labels may be passed to group by the columns in ``self``. Notice
            that a tuple is interpreted a (single) key.

我的数据示例:

           xmin      ymin      xmax    ...      confidence  class  frame_idx
25092  0.740968  0.379832  0.837317    ...        0.931677    2.0      988.0
25093  0.342227  0.272692  0.421934    ...        0.904239    2.0      988.0
25094  0.624577  0.333927  0.739960    ...        0.854978    2.0      988.0
25095  0.229680  0.260676  0.272042    ...        0.708415    2.0      988.0
25096  0.420753  0.291226  0.514133    ...        0.610117    2.0      988.0
25097  0.061343  0.265538  0.110653    ...        0.369827    2.0      988.0
25098  0.486258  0.711575  0.558676    ...        0.173804    2.0      988.0
25099  0.069983  0.596780  0.571130    ...        0.145625    2.0      988.0
25100  0.756423  0.693024  0.818869    ...        0.126882    1.0      988.0
25101  0.448371  0.000000  0.628947    ...        0.124258    1.0      988.0
25102  0.664900  0.808117  0.728619    ...        0.119267    2.0      988.0
25103  0.000000  0.776122  0.407657    ...        0.119260    2.0      988.0
25104  0.796306  0.878808  0.844132    ...        0.115530    1.0      988.0
25105  0.006830  0.000000  0.045303    ...        0.110469    1.0      988.0

编辑

我添加了组检测功能,当前它返回检测结果和索引列表中的索引列表

def group_detections(dets, thresh=0.5):
    # NOTE!!! - We assume all dets are from the same class!!!
    # thresh = IoU threshold for box elimination (lower - eliminate more boxes)
    # initialize the list of picked groups
    groups = []
    # if there are no boxes, return an empty list
    if dets.size == 0:
        return dets, groups

    # sort by descending quality
    dets = dets[(-dets[:, 4]).argsort(), :]

    # grab the coordinates of the bounding boxes
    x1 = dets[:, 0]
    y1 = dets[:, 1]
    x2 = dets[:, 2]
    y2 = dets[:, 3]

    # compute the area of the bounding boxes and sort the bounding
    # boxes by the bottom-right y-coordinate of the bounding box
    area = (x2 - x1) * (y2 - y1)
    # keep looping while some indexes still remain in the indexes
    # list
    all_idxs = np.arange(dets.shape[0])[::-1]
    while len(all_idxs) > 0:
        # grab the last index in the indexes list and add the
        # index value to the list of picked indexes
        i = all_idxs[-1]

        # find the largest (x, y) coordinates for the start of
        # the bounding box and the smallest (x, y) coordinates
        # for the end of the bounding box
        xx1 = np.maximum(x1[i], x1[all_idxs[:-1]])
        yy1 = np.maximum(y1[i], y1[all_idxs[:-1]])
        xx2 = np.minimum(x2[i], x2[all_idxs[:-1]])
        yy2 = np.minimum(y2[i], y2[all_idxs[:-1]])

        # compute the width and height of the bounding box
        w = np.maximum(0, xx2 - xx1)
        h = np.maximum(0, yy2 - yy1)

        # compute the ratio of overlap
        overlap = (w * h) / area[all_idxs[:-1]]

        # delete all indexes from the index list that have
        group_idxs = [i] + list(all_idxs[np.where(overlap > thresh)[0]])
        groups.append(group_idxs)
        remove_idxs = np.concatenate(([len(all_idxs) - 1], np.where(overlap > thresh)[0]))
        all_idxs = np.delete(all_idxs, remove_idxs)
    return dets, groups

输出示例: 目的地:

array([[0.70187318, 0.37271658, 0.72741735, 0.4855186 , 0.37807396,
        1.        , 0.        ],
       [0.15145046, 0.28159913, 0.1774891 , 0.33394483, 0.30386543,
        1.        , 0.        ],
       [0.51533937, 0.27924138, 0.5292989 , 0.33969468, 0.19332546,
        1.        , 0.        ],
       [0.78109378, 0.87562817, 0.82961792, 1.        , 0.13274428,
        1.        , 0.        ],
       [0.14983323, 0.28213197, 0.16887194, 0.33710539, 0.12764682,
        1.        , 0.        ],
       [0.14637044, 0.033903  , 0.26365086, 0.57686102, 0.12654687,
        1.        , 0.        ],
       [0.22914155, 0.26150814, 0.27148777, 0.31108513, 0.10923221,
        1.        , 0.        ],
       [0.18359725, 0.02795739, 0.23144765, 0.4838326 , 0.10619797,
        1.        , 0.        ],
       [0.22894984, 0.28633049, 0.24446395, 0.35711476, 0.10326774,
        1.        , 0.        ],
       [0.15507454, 0.2818917 , 0.18600157, 0.34016567, 0.10280589,
        1.        , 0.        ],
       [0.0071237 , 0.        , 0.04561174, 0.08170507, 0.10274705,
        1.        , 0.        ],
       [0.74405503, 0.38016886, 0.8356874 , 0.4577626 , 0.10082066,
        1.        , 0.        ]])

组:

[[0], [1, 9, 4], [2], [3], [5, 8, 7, 6], [10], [11]]

0 个答案:

没有答案