Python按名称,后缀和长度按字母顺序排序

时间:2017-10-07 19:09:13

标签: python sorting

我一直试图按名称,后缀和长度按字母顺序排序金属列表,但似乎只能按长度排序。我不确定我哪里出错了。

metals.csv

list of names with date and suffix
name,date,suffix
copper.abc,2017-10-06,abc
gold.xyz,2017-10-06,xyz
19823.efg,2017-10-06,efg
silver.abc,2017-10-06,abc
iron.efg,2017-10-06,efg
unknown9258.xyz,2017-10-06,xyz
nickel.xyz,2017-10-06,xyz
bronze.abc,2017-10-06,abc
platinum.abc,2017-10-06,abc
unknown--23.efg,2017-10-06,efg

filter_sort.py

#!/usr/bin/python
# -*- coding: utf-8 -*-

import enchant
import re
from operator import itemgetter, attrgetter

pattern = re.compile(u"([^0-9-]+\..*),(.*,.*)", flags=re.UNICODE)

original = open('metals.csv', 'r')
with open('output.txt', 'a') as newfile:
    for line in original.readlines():
        m = pattern.match(line)
        if m:
            repl = m.group(1)
            newfile.write(m.group(1)+"\n")
newfile.close()

d = enchant.Dict("en_US")

output = []

infile = open("output.txt", "r")
with open('filtered.txt', 'a') as filtered:
    for line in infile.readlines():
        word = line.strip('\n').split('.')[0]
        if d.check(word) is True:
            if len(word) <= 8:
                output.append("{0}.{1}".format(word, line.strip('\n').split('.')[1]))
    for name in sorted(output, key=len):
        filtered.write(str(name+"\n"))
filtered.close()

结果是:

gold.xyz
iron.efg
copper.abc
silver.abc
nickel.xyz
bronze.abc
platinum.abc

我想:

bronze.abc
copper.abc
silver.abc
platinum.abc
iron.efg
gold.xyz
nickel.xyz

我首先获取一个列表并使用数字或短划线过滤出名称,然后将其保存到新文件中。接下来,我尝试对结果列表进行排序,并将其再次保存到新列表中。我对Python并不熟悉,所以很明显而且效率很低。任何提示将不胜感激,提前谢谢!

2 个答案:

答案 0 :(得分:1)

您要求排序使用您的长度作为关键:

for name in sorted(output, key=len):

而是使用lambda对您的字典进行排序,该lambda返回一个像这样的元组:

for name in sorted(output, key=lambda k: (k.split('.')[1], k.split('.')[0], len)):

首先根据后缀(例如abc)排序,然后排序前缀(例如铜牌),最后按len排序。输出:

bronze.abc
copper.abc
silver.abc
platinum.abc
iron.efg
gold.xyz
nickel.xyz

答案 1 :(得分:1)

完整的优化解决方案:

import csv, re

def multi_sort(s):
    parts = s.split('.')
    return (parts[1], len(s), parts[0])

with open('metals.csv', 'r') as inp, open('output.txt', 'w', newline='') as out:
    reader = csv.DictReader(inp, fieldnames=None)  # name,date,suffix - header line
    names = []
    for l in reader:
        if re.search(r'[^0-9-]+\..*', l['name']):
            names.append(l['name'])
    names.sort(key=multi_sort)

    writer = csv.writer(out)
    for n in names:
        writer.writerow((n,))

output.txt内容:

bronze.abc
copper.abc
silver.abc
platinum.abc
iron.efg
gold.xyz
nickel.xyz