按ID

时间:2016-11-09 16:18:39

标签: python pandas numpy

我有一个数组数组,我想通过id得到最大数量。在下一个示例中,列2表示id,列4表示值。 当id = 1时,最大值为308.45。当id = 2时,最大值为310.508474。

输入:

[['X', '1', '0', '303.016666'],
['X1',  '1', '1', '305.516666'],
['X2',  '1', '2', '308.45'],
['X3',  '2', '0', '309.409836'],
['X4',  '2', '1', '310.508474'],
['X5',  '2', '2', '308.728813']]

输出:

[['X2',  '1', '2', '308.45'],
['X4',  '2', '1', '310.508474']]

我该怎么做?

3 个答案:

答案 0 :(得分:4)

使用pandas

import pandas as pd

df = pd.DataFrame([
        ['X',   1, 0, 303.016666],
        ['X1',  1, 1, 305.516666],
        ['X2',  1, 2, 308.45],
        ['X3',  2, 0, 309.409836],
        ['X4',  2, 1, 310.508474],
        ['X5',  2, 2, 308.728813]]
)

print(df.values[df.groupby(1)[3].idxmax()])

[['X2' 1 2 308.45]
 ['X4' 2 1 310.508474]]

numpy

a = np.array([
        ['X',   1, 0, 303.016666],
        ['X1',  1, 1, 305.516666],
        ['X2',  1, 2, 308.45],
        ['X3',  2, 0, 309.409836],
        ['X4',  2, 1, 310.508474],
        ['X5',  2, 2, 308.728813]
    ], dtype=object)

ids = np.unique(a[:, 1])
grp = np.where(ids == a[:, [1]], 1, np.nan)
expanded_value_column = grp * a[:, [3]].astype(float)
max_positions = np.nanargmax(expanded_value_column, axis=0)

print(a[max_positions])

[['X2' 1 2 308.45]
 ['X4' 2 1 310.508474]]

<强> 定时
enter image description here

答案 1 :(得分:2)

我能想象到的最简单,最直观的解决方案:

>>> l = [['X', '1', '0', '303.016666'],
... ['X1',  '1', '1', '305.516666'],
... ['X2',  '1', '2', '308.45'],
... ['X3',  '2', '0', '309.409836'],
... ['X4',  '2', '1', '310.508474'],
... ['X5',  '2', '2', '308.728813']]
>>> result = {}
>>> for a, b, c, d in l:
...     if b not in result or float(d) > float(result[b][2]):
...         result[b] = (a, c, d)
... 
>>> result
{'1': ('X2', '2', '308.45'), '2': ('X4', '1', '310.508474')}
>>> result = [(a, b, c, d) for b, (a, c, d) in result.items()]
>>> result
[('X2', '1', '2', '308.45'), ('X4', '2', '1', '310.508474')]

答案 2 :(得分:0)

您可以将 dict comprehension 表达式与set()的用法一起用于存储唯一ID:

my_data = [
    ['X', '1', '0', '303.016666'],
    ['X1',  '1', '1', '305.516666'],
    ['X2',  '1', '2', '308.45'],
    ['X3',  '2', '0', '309.409836'],
    ['X4',  '2', '1', '310.508474'],
    ['X5',  '2', '2', '308.728813']]

# Unique ids
my_id = set([data[1] for data in my_data])

my_max = {id: max([val for _, i, _, val in my_data if i==id]) for id in my_id}
# Content of 'my_max': {'1': '308.45', '2': '310.508474'}