Question

我正在解决一个匹配问题，必须将学生分配到学校。问题是我必须考虑每个学生的兄弟姐妹，因为这是在为每个学校设置优先级时的一个相关功能。

我的数据如下所示。

Index Student_ID    Brothers
0   92713846    [79732346]
1   69095898    [83462239]
2   67668672    [75788479, 56655021, 75869616]
3   83396441    []
4   65657616    [52821691]
5   62399116    []
6   78570850    [62046889, 63029349]
7   69185379    [70285250, 78819847, 78330994]
8   66874272    []
9   78624173    [73902609, 99802441, 95706649]
10  97134369    []
11  77358607    [52492909, 59830215, 71251829]
12  56314554    [54345813, 71451741]
13  97724180    [64626337]
14  73480196    [84454182, 90435785]
15  70717221    [60965551, 98620966, 70969443]
16  60942420    [54370313, 63581164, 72976764]
17  81882157    [78787923]
18  73387623    [87909970, 57105395]
19  59115621    [62494654]
20  54650043    [69308874, 88206688]
21  53368352    [63191962, 53031183]
22  76024585    [61392497]
23  84337377    [58419239, 96762668]
24  50099636    [80373936, 54314342]
25  62184397    [89185875, 84892080, 53223034]
26  85704767    [85509773, 81710287, 78387716]
27  85585603    [66254198, 87569015, 52455599]
28  82964119    [76360309, 76069982]
29  53776152    [92585971, 74907523]
...
6204 rows × 2 columns

Student_ID是每个学生的唯一ID，Brothers是包含该学生的同级兄弟的所有ID的列表。

为了保存用于匹配的数据，我创建了一个Student类，在其中保存了匹配所需的所有属性。这是link to download the entire dataset。

class Student():
    def __init__(self, index, id, vbrothers = []):
        self.__index = index
        self.__id = id
        self.__vbrothers = vbrothers

    @property
    def index(self):
        return self.__index

    @property
    def id(self):
        return self.__id

    @property
    def vbrothers(self):
        return self.__vbrothers

我正在实例化我的Student类对象，在我的数据框的所有行上进行循环，然后将每个行追加到列表中：

students = []
for index, row in students_data.iterrows():
    student = Student(index, row['Student_ID'],  row['Brothers'])
    students.append(student)

现在，我的问题是我需要一个指向students列表中每个兄弟姐妹的索引的指针。实际上，我正在实现此嵌套循环：

for student in students:
    student.vbrothers_index = [brother.index for brother in students if (student.id in brother.vbrothers)]

到目前为止，这是我整个代码中性能最差的部分。比最差的第二部分慢了四倍。

欢迎提出任何有关如何改善此嵌套循环性能的建议。

Answer 1

由于students中的顺序无关紧要，因此请使其成为字典：

students = {}
for index, row in students_data.iterrows():
    student = Student(index, row['Student_ID'],  row['Brothers'])
    students[row['Student_ID']] = student

现在，您可以按固定的时间按其ID检索每个学生：

for student in students:
    student.vbrothers_index = [students[brother.id].index for brother in student.vbrothers]

如何改善此嵌套循环的性能？

1 个答案: