使用数据类型的多列对xls文件内容进行排序

时间:2016-01-28 14:01:52

标签: python excel sorting

我必须按升序将xls文件内容按4列排序。

我将xls文件内容转换为列表列表。以下是输入

输入

data = """ABC, Do not Consider1, 101, Title and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 99, BIC Codes, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 98, Title and Subtitle, Do not Consider2, 25/12/2015 
ABC, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 30/12/2015"""

字符串格式的相应输出

 data = """ABC, Do not Consider1, 98, Title and Subtitle, Do not Consider2, 25/12/2015 
ABC, Do not Consider1, 99, BIC Codes, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 101, Title and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
"""

首先我将数据拆分为列表格式:

    # Split data to list.
>>> data_list = [i.split(", ") for i in  data.split("\n")]

>>> print "\n".join([", ".join(i) for i in  data_list])
ABC, Do not Consider1, 101, Title and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 99, BIC Codes, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 98, Title and Subtitle, Do not Consider2, 25/12/2015 
ABC, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 30/12/2015

以下是排序要求

- We have to sort by index0 , 
   if index0 have same values for multiple items then sort by Index2 
       if index0 and index2 are same for multiple items then sort by Index3
           if index0, index2 and index3 are same for multiple items then sort by Index5

我的逻辑是

  1. 创建index0,index2,index5和index5
  2. 的字符串
  3. 使用步骤1中的密钥创建字典
  4. 使用排序函数对键列表进行排序
  5. 再次创建xls文件。
  6. 代码:

    >>> from collections import defaultdict
    >>> data_dict = defaultdict(list)
    >>> for i in data_list:
    ...     key = "%s%s%s%s"%(i[0].strip(), i[2].strip(), i[3].strip(), i[5].strip())
    ...     data_dict[key].append(i)
    ... 
    >>> sorted_keys = sorted(data_dict.keys())
    >>> 
    >>> for i in sorted_keys:
    ...     for j in data_dict[i]:
    ...         print j
    ...         
    ... 
    ['ABC', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015']
    ['ABC', 'Do not Consider1', '100', 'Title and Subtitle', 'Do not Consider2', '30/12/2015']
    ['ABC', 'Do not Consider1', '100', 'Title and Subtitle', 'Do not Consider2', '31/12/2015']
    ['ABC', 'Do not Consider1', '101', 'Title and Subtitle', 'Do not Consider2', '30/12/2015']
    ['ABC', 'Do not Consider1', '98', 'Title and Subtitle', 'Do not Consider2', '25/12/2015 ']
    ['ABC', 'Do not Consider1', '99', 'BIC Codes', 'Do not Consider2', '31/12/2015']
    ['XYZ', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015']
    ['XYZ', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015']
    

    但是Index2中有数字,即第2列和Index5中的Date,即第5列,所以不能获得排序数据。

    你能帮我解决这个问题吗?

2 个答案:

答案 0 :(得分:1)

您可以使用sorted功能按多个键排序,如下所示: -

sorted_list = sorted(data_list, key=lambda item: (item[0], int(item[2]), item[3]))
print "\n".join([", ".join(i) for i in  sorted_list])

返回

ABC, Do not Consider1, 98, Title and Subtitle, Do not Consider2, 25/12/2015 
ABC, Do not Consider1, 99, BIC Codes, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 101, Title and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015

诀窍是让你的key lambda返回一个包含要排序的所有值的元组,并使用int()函数将第三列的值转换为整数。

答案 1 :(得分:1)

您应该可以通过一次sorted()电话完成所需的操作。 csv模块可用于解析数据:

import csv
import StringIO
from itertools import groupby

data = """ABC, Do not Consider1, 101, Title and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 99, BIC Codes, Do not Consider2, 31/12/2015
ABC, Do not Consider1, 98, Title and Subtitle, Do not Consider2, 25/12/2015 
ABC, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
XYZ, Do not Consider1, 100, ATitle and Subtitle, Do not Consider2, 30/12/2015
ABC, Do not Consider1, 100, Title and Subtitle, Do not Consider2, 30/12/2015"""

csv_input = csv.reader(StringIO.StringIO(data), skipinitialspace=True)
rows = sorted(list(csv_input), key=lambda x: (x[0], int(x[2]), x[3], x[5]))

for row in rows:
    print row

这将为您提供以下内容:

['ABC', 'Do not Consider1', '98', 'Title and Subtitle', 'Do not Consider2', '25/12/2015 ']
['ABC', 'Do not Consider1', '99', 'BIC Codes', 'Do not Consider2', '31/12/2015']
['ABC', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015']
['ABC', 'Do not Consider1', '100', 'Title and Subtitle', 'Do not Consider2', '30/12/2015']
['ABC', 'Do not Consider1', '100', 'Title and Subtitle', 'Do not Consider2', '31/12/2015']
['ABC', 'Do not Consider1', '101', 'Title and Subtitle', 'Do not Consider2', '30/12/2015']
['XYZ', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015']
['XYZ', 'Do not Consider1', '100', 'ATitle and Subtitle', 'Do not Consider2', '30/12/2015']