Question

如您所见，CallId被重复，但是数据（时间）没有重复。我需要知道或每个呼叫ID的最短时间。

当我指定CallId时，我得到的时间最短，但是当我拥有超过550个不同的CallId时，这将非常费力。我是Python的新手，但我认为它有一种更简单的解决方法。

Answer 1

您可以在CallId列上使用pandas.Series.unique。这将为您提供该列中所有唯一值的集合。然后遍历该结果，并为每个唯一值pandas.DataFrame.query遍历DataFrame，以获得仅包含每个唯一CallId条目的子数据帧。然后计算查询的DataFrame中Data列的最小值：

# all unique CallId's
unique_callids = entrou.CallId.unique()
# loop over the unique CallId values
for ucid in unique_callids:
    # query the main dataframe to get a sub-dataframe of only CallId == ucid
    ucid_entrou = entrou.query("CallId == {}".format(ucid))
    # calculate the minimum of `Data` for this sub-dataframe
    min_ucid = ucid_entrou.Data.min()

我希望对我的数据框中的列执行相同的操作

1 个答案: