Python从文件的段/块读取行

时间:2018-11-05 05:12:41

标签: python database python-3.x

我正在尝试在Python中实现Two pass multiway merge sort
到目前为止,我已经实现了Pass 1,而在实现Pass 2时遇到了麻烦。

以下是要读取的文件结构(tab separated data):

name    ssn gender  job company address
Alicia Best 201-30-5041 F   Tree surgeon    Griffin, Vasquez and Hunt   932 Ryan Turnpike Suite 686East Roger, MD 56628
Cameron Scott   700-12-6740 M   Financial adviser   Griffith-Sosa   1733 Scott PineEast Matthew, ID 20677
William Novak   054-78-8142 M   Music therapist Williams LLC    648 Ballard Courts Suite 214Taylorhaven, CA 45271
Paul Hodges 875-49-7490 M   Advertising art director    Ward, Salas and Malone  15382 Roger VillageSouth Cody, IA 41827
VincTannerent Ramirez   569-99-2727 M   Theatre stage manager   Price, White and Black  76567 Phillips LoafEast William, DE 32235
Kellie Pacheco  300-77-5182 F   Doctor, hospital    Miller Inc  9789 Sullivan CornersSouth Deniseshire, MD 16612
...
...
...

我已经将Pass 1的结果存储在一个单独的文件结构中,如下所示:

name    ssn gender  job company address
Adam Duke   740-34-2566 M   Diplomatic Services operational officer King, Jones and Castillo    178 Robert LoopWhitechester, MT 97087
Adam Morris 840-49-4963 M   Neurosurgeon    Knight Group    1104 Laura StationKarenshire, OR 86801
Alan Sanchez    688-73-4197 M   Environmental education officer Thomas Inc  PSC 2324, Box 9034APO AE 08323
...
... after N lines
...
Yolanda Ramirez 859-41-4401 F   Biomedical engineer Lynn-Brock  498 
Gutierrez Oval Suite 867South Clifford, OH 10700
Albert Reyes    025-28-1361 M   Engineer, energy    Nolan, Vazquez and Jordan   4096 Elizabeth PlazaLake Timothy, RI 12215
Alex Escobar    018-96-9641 M   Glass blower/designer   Flowers, Li and Smith   251 Mcpherson MotorwaySchmidtfort, FM 43628
Alexander Flores    648-01-7451 M   Engineer, maintenance   Munoz, Tucker and Freeman   0743 Vanessa FortJonesview, GA 57871
Alexander Kramer    061-08-3051 M   Designer, ceramics/pottery  Johnson-Peterson    800 Kristen VillageStewartton, MA 57143

在测试中,我考虑了用100行分隔大块,并使用属性(可以是名称,性别或其他有效属性)在内部对数据进行排序

对于 pass 2 ,我需要阅读第1, 101, 201....行,并在这些行中选择合适的结果。
假设如果我选择第101行,那么在下一次迭代中,我需要读取1, 102, 201....

所以我很难弄清楚如何才能完成上述两个步骤来有效地遍历文件?

  • 我乐于接受任何建议,例如将块存储在不同的文件中,而不是将单个文件存储在其他文件中。

  • link完成代码

我还提到了以下线程:

0 个答案:

没有答案