Python插入按行对csv进行排序

时间:2021-03-04 03:30:38

标签: python csv sorting multidimensional-array insertion-sort

我的目标是使用插入排序按第一列中的数字对 csv 文件的内容进行排序,例如我想要这个:

[[7831703,  Christian,  Schmidt]
[2299817,   Amber,  Cohen]
[1964394,   Gregory,    Hanson]
[1984288,   Aaron,  White]
[9713285,   Alexander,  Kirk]
[7025528,   Janice, Lee]
[6441979,   Sarah,  Browning]
[8815776,   Rick,   Wallace]
[2395480,   Martin, Weinstein]
[1927432,   Stephen,    Morrison]]

并将其排序为:

[[1927432,  Stephen,    Morrison]
[1964394,   Gregory,    Hanson]
[1984288,   Aaron,  White]
[2299817,   Amber,  Cohen]
[2395480,   Martin, Weinstein]
[6441979,   Sarah,  Browning]
[7025528,   Janice, Lee]
[7831703,   Christian,  Schmidt]
[8815776,   Rick,   Wallace]
[9713285,   Alexander,  Kirk]]

基于 python 中第一列中的数字,我当前的代码如下所示:

import csv
with open('EmployeeList.csv', newline='') as File:  
    reader = csv.reader(File)
    readList = list(reader)
    for row in reader:
        print(row)

def insertionSort(readList): 
  #Traverse through 1 to the len of the list
    for row in range(len(readList)):
# Traverse through 1 to len(arr) 
        for i in range(1, len(readList[row])): 
    
            key = readList[row][i] 
    


    # Move elements of arr[0..i-1], that are 
    # greater than key, to one position ahead 
    # of their current position
            j = i-1
            while j >=0 and key < readList[row][j] : 
                    readList[row] = readList[row] 
                    j -= 1
            readList[row] = key 

insertionSort(readList)
print ("Sorted array is:") 
for i in range(len(readList)): 
    print ( readList[i])

代码已经可以对二维数组的内容进行排序,但它试图对所有内容进行排序。 我想如果我摆脱了 [] 它会起作用但在测试中它没有给出我需要的东西。 为了再次澄清,我想根据第一列数值对行位置进行排序。

1 个答案:

答案 0 :(得分:0)

抱歉,如果我没有理解您的需求。但是你有一个列表,你需要对它进行排序?为什么不直接在列表对象中使用 sort 方法?

>>> data = [[7831703,  "Christian",  "Schmidt"],
... [2299817,   "Amber",  "Cohen"],
... [1964394,   "Gregory",    "Hanson"],
... [1984288,   "Aaron",  "White"],
... [9713285,   "Alexander",  "Kirk"],
... [7025528,   "Janice", "Lee"],
... [6441979,   "Sarah",  "Browning"],
... [8815776,   "Rick",   "Wallace"],
... [2395480,   "Martin", "Weinstein"],
... [1927432,   "Stephen",    "Morrison"]]
>>> data.sort()
>>> from pprint import pprint
>>> pprint(data)
[[1927432, 'Stephen', 'Morrison'],
 [1964394, 'Gregory', 'Hanson'],
 [1984288, 'Aaron', 'White'],
 [2299817, 'Amber', 'Cohen'],
 [2395480, 'Martin', 'Weinstein'],
 [6441979, 'Sarah', 'Browning'],
 [7025528, 'Janice', 'Lee'],
 [7831703, 'Christian', 'Schmidt'],
 [8815776, 'Rick', 'Wallace'],
 [9713285, 'Alexander', 'Kirk']]
>>> 

请注意,这里我们将第一个元素解析为整数。如果要按数值排序很重要(99 在 100 之前)。

不要被导入 pprint 所迷惑。你不需要它来排序。我只是用来在控制台中获得更好的输出。

还要注意 List.sort() 是就地方法。它不返回排序列表,而是对列表本身进行排序。

*** 编辑 ***

这是两种不同的排序函数方法。两者都可以进行大量优化,但我希望您对如何做到这一点有所了解。两者都应该可以工作,您可以在循环中添加一些打印命令以查看那里会发生什么。

第一个递归版本。它在每次运行时对列表进行一点排序,直到它被排序。

def recursiveSort(readList):
    # You don't want to mess original data, so we handle copy of it
    data = readList.copy()
    changed = False
    res = []
    while len(data): #while 1 shoudl work here as well because eventually we break the loop
        if len(data) == 1: 
            # There is only one element left. Let's add it to end of our result.
            res.append(data[0])
            break;
        if data[0][0] > data[1][0]:
            # We compare first two elements in list. 
            # If first one is bigger, we remove second element from original list and add it next to the result set. 
            # Then we raise changed flag to tell that we changed the order of original list.
            res.append(data.pop(1))
            changed = True
        else:
            # otherwise we remove first element from the list and add next to the result list.
            res.append(data.pop(0))
    
    if not changed:
       #if no changes has been made,  the list is in order
       return res
    else:
       #if we made changes, we sort list one more time.
       return recursiveSort(res)

这是一个迭代版本,更接近您的原始功能。

def iterativeSort(readList):
    res = []
    for i in range(len(readList)):
       print (res)
       #loop through the original list
       if len(res) == 0:
          # if we don't have any items in our result list, we add first element here.
          res.append(readList[i])
       else:
          done = False
          for j in range(len(res)):
              #loop through the result list this far
              if res[j][0] > readList[i][0]:
                  #if our item in list is smaller than element in res list, we insert it here
                  res.insert(j, readList[i])
                  done = True
                  break
          if not done:
             #if our item in list is bigger than all the items in result list, we put it last.
             res.append(readList[i])
       print(res)
    return res
相关问题