我有一个数组数组,我想通过id得到最大数量。在下一个示例中,列2表示id,列4表示值。 当id = 1时,最大值为308.45。当id = 2时,最大值为310.508474。
输入:
[['X', '1', '0', '303.016666'],
['X1', '1', '1', '305.516666'],
['X2', '1', '2', '308.45'],
['X3', '2', '0', '309.409836'],
['X4', '2', '1', '310.508474'],
['X5', '2', '2', '308.728813']]
输出:
[['X2', '1', '2', '308.45'],
['X4', '2', '1', '310.508474']]
我该怎么做?
答案 0 :(得分:4)
使用pandas
import pandas as pd
df = pd.DataFrame([
['X', 1, 0, 303.016666],
['X1', 1, 1, 305.516666],
['X2', 1, 2, 308.45],
['X3', 2, 0, 309.409836],
['X4', 2, 1, 310.508474],
['X5', 2, 2, 308.728813]]
)
print(df.values[df.groupby(1)[3].idxmax()])
[['X2' 1 2 308.45]
['X4' 2 1 310.508474]]
纯numpy
a = np.array([
['X', 1, 0, 303.016666],
['X1', 1, 1, 305.516666],
['X2', 1, 2, 308.45],
['X3', 2, 0, 309.409836],
['X4', 2, 1, 310.508474],
['X5', 2, 2, 308.728813]
], dtype=object)
ids = np.unique(a[:, 1])
grp = np.where(ids == a[:, [1]], 1, np.nan)
expanded_value_column = grp * a[:, [3]].astype(float)
max_positions = np.nanargmax(expanded_value_column, axis=0)
print(a[max_positions])
[['X2' 1 2 308.45]
['X4' 2 1 310.508474]]
答案 1 :(得分:2)
我能想象到的最简单,最直观的解决方案:
>>> l = [['X', '1', '0', '303.016666'],
... ['X1', '1', '1', '305.516666'],
... ['X2', '1', '2', '308.45'],
... ['X3', '2', '0', '309.409836'],
... ['X4', '2', '1', '310.508474'],
... ['X5', '2', '2', '308.728813']]
>>> result = {}
>>> for a, b, c, d in l:
... if b not in result or float(d) > float(result[b][2]):
... result[b] = (a, c, d)
...
>>> result
{'1': ('X2', '2', '308.45'), '2': ('X4', '1', '310.508474')}
>>> result = [(a, b, c, d) for b, (a, c, d) in result.items()]
>>> result
[('X2', '1', '2', '308.45'), ('X4', '2', '1', '310.508474')]
答案 2 :(得分:0)
您可以将 dict comprehension 表达式与set()
的用法一起用于存储唯一ID:
my_data = [
['X', '1', '0', '303.016666'],
['X1', '1', '1', '305.516666'],
['X2', '1', '2', '308.45'],
['X3', '2', '0', '309.409836'],
['X4', '2', '1', '310.508474'],
['X5', '2', '2', '308.728813']]
# Unique ids
my_id = set([data[1] for data in my_data])
my_max = {id: max([val for _, i, _, val in my_data if i==id]) for id in my_id}
# Content of 'my_max': {'1': '308.45', '2': '310.508474'}