按字符串的多个部分对字符串列表进行排序

时间:2018-04-29 05:10:52

标签: python string sorting

我有一个字符串列表,其格式如下:

['XXX_A-BXXX', 'XXX_A-BXXX', 'XXX_A-BXXX', 'XXX_A-BXXX', ...]

A=['C1','C2','C3','T1','T2', 'T3']
B=['3s','6m','1h','8h','24h']
XXX = random combination of alphabets and numbers

...我想根据A部分对列表进行排序,然后根据上面的顺序对B部分进行排序。我该怎么做呢?

对于一些真实的样本输入,这是我想要的:

['Vout_C1-3-6sNP-N',
 'Vout_C1-6mNP-N',
 'Vout_C1-1hNP-N',
 'Vout_C1-8hNP-N',
 'Vout_C1-24hNP-N',
 'Vout_C2-3-6sNP-N',
 'Vout_C2-6mNP-N',
 'Vout_C2-1hNP-N',
 'Vout_C2-8hNP-N',
 'Vout_C2-24hNP-N',
 'Vout_C3-3-6sNP-N',
 'Vout_C3-6mNP-N',
 'Vout_C3-1hNP-N',
 'Vout_C3-8hNP-N',
 'Vout_C3-24hNP-N',
 'Vout_T1-3-6sNP-N',
 'Vout_T1-6mNP-N',
 'Vout_T1-1hNP-N',
 'Vout_T1-8hNP-N',
 'Vout_T1-24hNP-N',
 'Vout_T2-3-6sNP-N',
 'Vout_T2-6mNP-N',
 'Vout_T2-1hNP-N',
 'Vout_T2-8hNP-N',
 'Vout_T2-24hNP-N',
 'Vout_T3-3-6sNP-N',
 'Vout_T3-6mNP-N',
 'Vout_T3-1hNP-N',
 'Vout_T3-8hNP-N',
 'Vout_T3-24hNP-N']

感谢大家的回答。我也想出了一个方法,但似乎我无法回答我自己的问题所以我只是把它放在这里。

lst_strings = [...]
pairs = [re.split('[_\-NP]', file)[1:3] for file in lst_strings] # get A part and B part
groups = [pair[0] for pair in pairs] # A part
times = [t[1].replace('3', '3-6s') if t[1]=='3' else t[1] for t in pairs] # B part (previous split messed up '3-6s')

sorted_groups = [str(a)+str(x) for a in ['C','T'] for x in range(1,5)] # ['C1','C2',...,'T1','T2',...]
sorted_times = ['3-6s','6m','1h','8h','24h']

df = pd.DataFrame(list(zip(lst_strings, groups, times)), columns=['data', 'group', 'time'])
df1 = pd.concat([df.loc[df['time']==sorted_times[i]] for i in range(len(sorted_times))])
df2 = pd.concat([df1.loc[df1['group']==sorted_groups[i]] for i in range(len(sorted_groups))])
lst_sorted = df2['data'].values

3 个答案:

答案 0 :(得分:1)

解决方案

为您的两个条件创建两个排序键似乎是一个很好的方法:

def multiple_sort(value):
    first, second = value.split('-', 1)
    # first key is `C1`, C2` etc.
    key1 = first.split('_')[-1]
    # use this order for second sort key
    names = ['3-6s', '6m', '1h', '8h', '24h']
    key2 = len(names) # last if not found
    for pos, name in enumerate(names):
        if second.startswith(name):
            key2 = pos
            break
    return key1, key2

根据您的数据进行测试:

data = ['Vout_C1-3-6sNP-N',
 'Vout_C1-6mNP-N',
 'Vout_C1-1hNP-N',
 'Vout_C1-8hNP-N',
 'Vout_C1-24hNP-N',
 'Vout_C2-3-6sNP-N',
 'Vout_C2-6mNP-N',
 'Vout_C2-1hNP-N',
 'Vout_C2-8hNP-N',
 'Vout_C2-24hNP-N',
 'Vout_C3-3-6sNP-N',
 'Vout_C3-6mNP-N',
 'Vout_C3-1hNP-N',
 'Vout_C3-8hNP-N',
 'Vout_C3-24hNP-N',
 'Vout_T1-3-6sNP-N',
 'Vout_T1-6mNP-N',
 'Vout_T1-1hNP-N',
 'Vout_T1-8hNP-N',
 'Vout_T1-24hNP-N',
 'Vout_T2-3-6sNP-N',
 'Vout_T2-6mNP-N',
 'Vout_T2-1hNP-N',
 'Vout_T2-8hNP-N',
 'Vout_T2-24hNP-N',
 'Vout_T3-3-6sNP-N',
 'Vout_T3-6mNP-N',
 'Vout_T3-1hNP-N',
 'Vout_T3-8hNP-N',
 'Vout_T3-24hNP-N']

使订单随机:

import random
random.shuffle(data)

查看结果:

import pprint
pprint.pprint(sorted(data, key=multiple_sort))

输出:

['Vout_C1-3-6sNP-N',
 'Vout_C1-6mNP-N',
 'Vout_C1-1hNP-N',
 'Vout_C1-8hNP-N',
 'Vout_C1-24hNP-N',
 'Vout_C2-3-6sNP-N',
 'Vout_C2-6mNP-N',
 'Vout_C2-1hNP-N',
 'Vout_C2-8hNP-N',
 'Vout_C2-24hNP-N',
 'Vout_C3-3-6sNP-N',
 'Vout_C3-6mNP-N',
 'Vout_C3-1hNP-N',
 'Vout_C3-8hNP-N',
 'Vout_C3-24hNP-N',
 'Vout_T1-3-6sNP-N',
 'Vout_T1-6mNP-N',
 'Vout_T1-1hNP-N',
 'Vout_T1-8hNP-N',
 'Vout_T1-24hNP-N',
 'Vout_T2-3-6sNP-N',
 'Vout_T2-6mNP-N',
 'Vout_T2-1hNP-N',
 'Vout_T2-8hNP-N',
 'Vout_T2-24hNP-N',
 'Vout_T3-3-6sNP-N',
 'Vout_T3-6mNP-N',
 'Vout_T3-1hNP-N',
 'Vout_T3-8hNP-N',
 'Vout_T3-24hNP-N']

解释

查看一些示例字符串:

data[:10]

['Vout_C1-1hNP-N',
 'Vout_C2-1hNP-N',
 'Vout_C2-8hNP-N',
 'Vout_T2-24hNP-N',
 'Vout_C1-3-6sNP-N',
 'Vout_T3-6mNP-N',
 'Vout_C3-24hNP-N',
 'Vout_C3-3-6sNP-N',
 'Vout_C1-8hNP-N',
 'Vout_T2-6mNP-N']

函数multiple_sort()生成以下值:

[multiple_sort(x) for x in data[:10]]

[('C1', 2),
 ('C2', 2),
 ('C2', 3),
 ('T2', 4),
 ('C1', 0),
 ('T3', 1),
 ('C3', 4),
 ('C3', 0),
 ('C1', 3),
 ('T2', 1)]

现在:

sorted(data, key=multiple_sort)

使用这些键进行排序。即,它首先按第一个键C1C2等排序,如果第二个键223,{{ 1}}等。

答案 1 :(得分:0)

使用@chrisz回答的正则表达式

import re
from random import shuffle

def customOrderKey(e):
    matches = re.findall(r'[A-Z]\d-(\d+-\d+[mhs]|\d+[mhs])', e)
    return '' if len(matches) == 0 else matches[0]

x = ['Vout_C1-3-6sNP-N', 'Vout_C1-6mNP-N', 'Vout_C1-1hNP-N', 'Vout_C1-8hNP-N', 'Vout_C1-24NP-N', 'Vout_C2-3-6sNP-N', 'Vout_C2-6mNP-N', 'Vout_C2-1hNP-N', 'Vout_C2-8hNP-N', 'Vout_C2-24NP-N', 'Vout_C3-3-6sNP-N', 'Vout_C3-6mNP-N', 'Vout_C3-1hNP-N', 'Vout_C3-8hNP-N', 'Vout_C3-24NP-N', 'Vout_T1-3-6sNP-N', 'Vout_T1-6mNP-N', 'Vout_T1-1hNP-N', 'Vout_T1-8hNP-N', 'Vout_T1-24NP-N', 'Vout_T2-3-6sNP-N', 'Vout_T2-6mNP-N', 'Vout_T2-1hNP-N', 'Vout_T2-8hNP-N', 'Vout_T2-24NP-N', 'Vout_T3-3-6sNP-N', 'Vout_T3-6mNP-N', 'Vout_T3-1hNP-N', 'Vout_T3-8hNP-N', 'Vout_T3-24NP-N']

shuffle(x) # shuffling just to check the code

order = ['3-6s', '6m', '1h', '8h', '24h', '']
x.sort(key=(lambda x: x[5:])) # sort by A
x.sort(key=(lambda x: order.index(customOrderKey(x)))) # sort by B

注意:Python的排序是稳定的,因此排序的顺序并不重要

答案 2 :(得分:0)

好的,我刚刚去了....

所以我开始生成数据(因为你没有提供足够的数据)

from random import randint

def rnd_3_char():
    return chr(65+randint(0,25))+chr(65+randint(0,25))+chr(65+randint(0,25))

def gen_data():
    A=['C1','C2','C3','T1','T2', 'T3']
    B=['3s','4s','5s','6m','1h','8h','24h']
    return "{}_{}_{}".format(rnd_3_char(),A[randint(0,len(A)-1)],B[randint(0,len(B)-1)])

我将其放入名为data ...的列表中

data=[gen_data() for a in range(500)]

十大记录看起来像这样......

[' YTI_T1_5s&#39 ;,  ' ZHB_T2_8h&#39 ;,  ' RRN_C3_6m&#39 ;,  ' VLW_C1_4s&#39 ;,  ' AUP_T3_6m&#39 ;,  ' OFU_T1_4s&#39 ;,  ' XTE_C2_5s&#39 ;,  ' VQV_T3_8h&#39 ;,  ' NIC_C3_4s&#39 ;,  ' RLC_T2_8h']

这似乎符合您的要求。

现在自定义排序...我将数据拆分为3

因此RLC_T2_8h变为RLC T2 8h。

现在使用正则表达式我在内部计算第3个值的秒数,并且第1个值将这些值返回到排序函数。

import re

def my_sort(a):
    sec_cnt=0
    parts=a.split('_')
    match=re.findall('([0-9]+)([shm])',parts[2])
    try:
        if match[0][1]=='s':
            sec_cnt=int(match[0][0])
        elif match[0][1]=='m':
            sec_cnt=60*int(match[0][0])
        elif match[0][1]=='h':
            sec_cnt=3600*int(match[0][0])
    except:
        #print("{}".format(parts[1]))
        pass

    return parts[1],sec_cnt

所以使用这个

data2=sorted(data,key=my_sort)
data2[:10]

返回

[' BBM_C1_3s&#39 ;,  ' TSD_C1_3s&#39 ;,  ' YZR_C1_3s&#39 ;,  ' HJL_C1_3s&#39 ;,  ' TNU_C1_3s&#39 ;,  ' LYK_C1_3s&#39 ;,  ' MYT_C1_3s&#39 ;,  ' FFX_C1_3s&#39 ;,  ' XDB_C1_3s&#39 ;,  ' BVB_C1_3s&#39 ;,  ' LYD_C1_3s&#39 ;,  ' NIM_C1_3s&#39 ;,  ' NBU_C1_3s',

希望这足够接近