我正在尝试在Python中实现Two pass multiway merge sort。
到目前为止,我已经实现了Pass 1,而在实现Pass 2时遇到了麻烦。
以下是要读取的文件结构(tab separated data
):
name ssn gender job company address
Alicia Best 201-30-5041 F Tree surgeon Griffin, Vasquez and Hunt 932 Ryan Turnpike Suite 686East Roger, MD 56628
Cameron Scott 700-12-6740 M Financial adviser Griffith-Sosa 1733 Scott PineEast Matthew, ID 20677
William Novak 054-78-8142 M Music therapist Williams LLC 648 Ballard Courts Suite 214Taylorhaven, CA 45271
Paul Hodges 875-49-7490 M Advertising art director Ward, Salas and Malone 15382 Roger VillageSouth Cody, IA 41827
VincTannerent Ramirez 569-99-2727 M Theatre stage manager Price, White and Black 76567 Phillips LoafEast William, DE 32235
Kellie Pacheco 300-77-5182 F Doctor, hospital Miller Inc 9789 Sullivan CornersSouth Deniseshire, MD 16612
...
...
...
我已经将Pass 1的结果存储在一个单独的文件结构中,如下所示:
name ssn gender job company address
Adam Duke 740-34-2566 M Diplomatic Services operational officer King, Jones and Castillo 178 Robert LoopWhitechester, MT 97087
Adam Morris 840-49-4963 M Neurosurgeon Knight Group 1104 Laura StationKarenshire, OR 86801
Alan Sanchez 688-73-4197 M Environmental education officer Thomas Inc PSC 2324, Box 9034APO AE 08323
...
... after N lines
...
Yolanda Ramirez 859-41-4401 F Biomedical engineer Lynn-Brock 498
Gutierrez Oval Suite 867South Clifford, OH 10700
Albert Reyes 025-28-1361 M Engineer, energy Nolan, Vazquez and Jordan 4096 Elizabeth PlazaLake Timothy, RI 12215
Alex Escobar 018-96-9641 M Glass blower/designer Flowers, Li and Smith 251 Mcpherson MotorwaySchmidtfort, FM 43628
Alexander Flores 648-01-7451 M Engineer, maintenance Munoz, Tucker and Freeman 0743 Vanessa FortJonesview, GA 57871
Alexander Kramer 061-08-3051 M Designer, ceramics/pottery Johnson-Peterson 800 Kristen VillageStewartton, MA 57143
在测试中,我考虑了用100行分隔大块,并使用属性(可以是名称,性别或其他有效属性)在内部对数据进行排序
对于 pass 2 ,我需要阅读第1, 101, 201....
行,并在这些行中选择合适的结果。
假设如果我选择第101行,那么在下一次迭代中,我需要读取1, 102, 201....
所以我很难弄清楚如何才能完成上述两个步骤来有效地遍历文件?
我乐于接受任何建议,例如将块存储在不同的文件中,而不是将单个文件存储在其他文件中。
link完成代码
我还提到了以下线程: