Question

我尝试通过digitize模块使用numpy来帮助维护成绩册。想法是输入学生在课堂上获得的总分，使得输出是相应的字母等级。我的尝试如下：

import numpy as np
from collections import OrderedDict

## letter grades and points at cusps of letter grades
letter_grades = np.array(['F', 'D-', 'D', 'D+', 'C-', 'C', 'C+', 'B-', 'B', 'B+', 'A-', 'A'])
point_edges = np.concatenate(np.linspace(101, 153, len(letter_grades)), 10**3)
point_edges[0] = 0

## each letter grade corresponds to point values within the two corresponding point edges
edge_pairs = np.array([('{} - {}'.format(point_edges[idx-1], point_edges[idx])) for idx in range(1, len(point_edges))])
criteria = OrderedDict(zip(letter_grades, edge_pairs))
# print(criteria)

## sample data (the top one works, the one below throws an error)
# point_scores = (0, 100, 100.9, 101, 101.1, 136)
point_scores = (0, 100, 100.9, 101, 101.1, 136, 146, 150, 152, 153, 154)

## use numpy to get result
indices = np.digitize(point_scores, point_edges)
final_grades = letter_grades[indices]

for point, grade in zip(point_scores, final_grades):
    print("\n .. {} POINTS :: {}\n".format(point, grade))

运行上面的代码会输出以下错误：

IndexError: index 12 is out of bounds for axis 1 with size 12

我将1000作为point_edges的最后一个元素，以便任何大于153的输入值都会输出'A'（如上面注释的print(criteria)语句中所示。，该算法仅适用于严格小于153的输入值。为什么会发生这种情况，我该如何解决？

Answer 1

np.digizize的编号与np.histogram不同，表示超出边界的值：

来自docs：

如果x中的值超出了bin的范围，则为0或len（bin）适当地返回。

您的案例中的索引12表示某个值高于给定的限制。如果你想要最后一个bin，这意味着在你的情况下索引11。索引为0的第一个bin是低于下边界的值，索引1是第一个有效的bin。

如何将numpy数字化用于超出范围的连接值？

1 个答案: