我有一个字符串列表,其格式如下:
['XXX_A-BXXX', 'XXX_A-BXXX', 'XXX_A-BXXX', 'XXX_A-BXXX', ...]
A=['C1','C2','C3','T1','T2', 'T3']
B=['3s','6m','1h','8h','24h']
XXX = random combination of alphabets and numbers
...我想根据A部分对列表进行排序,然后根据上面的顺序对B部分进行排序。我该怎么做呢?
对于一些真实的样本输入,这是我想要的:
['Vout_C1-3-6sNP-N',
'Vout_C1-6mNP-N',
'Vout_C1-1hNP-N',
'Vout_C1-8hNP-N',
'Vout_C1-24hNP-N',
'Vout_C2-3-6sNP-N',
'Vout_C2-6mNP-N',
'Vout_C2-1hNP-N',
'Vout_C2-8hNP-N',
'Vout_C2-24hNP-N',
'Vout_C3-3-6sNP-N',
'Vout_C3-6mNP-N',
'Vout_C3-1hNP-N',
'Vout_C3-8hNP-N',
'Vout_C3-24hNP-N',
'Vout_T1-3-6sNP-N',
'Vout_T1-6mNP-N',
'Vout_T1-1hNP-N',
'Vout_T1-8hNP-N',
'Vout_T1-24hNP-N',
'Vout_T2-3-6sNP-N',
'Vout_T2-6mNP-N',
'Vout_T2-1hNP-N',
'Vout_T2-8hNP-N',
'Vout_T2-24hNP-N',
'Vout_T3-3-6sNP-N',
'Vout_T3-6mNP-N',
'Vout_T3-1hNP-N',
'Vout_T3-8hNP-N',
'Vout_T3-24hNP-N']
感谢大家的回答。我也想出了一个方法,但似乎我无法回答我自己的问题所以我只是把它放在这里。
lst_strings = [...]
pairs = [re.split('[_\-NP]', file)[1:3] for file in lst_strings] # get A part and B part
groups = [pair[0] for pair in pairs] # A part
times = [t[1].replace('3', '3-6s') if t[1]=='3' else t[1] for t in pairs] # B part (previous split messed up '3-6s')
sorted_groups = [str(a)+str(x) for a in ['C','T'] for x in range(1,5)] # ['C1','C2',...,'T1','T2',...]
sorted_times = ['3-6s','6m','1h','8h','24h']
df = pd.DataFrame(list(zip(lst_strings, groups, times)), columns=['data', 'group', 'time'])
df1 = pd.concat([df.loc[df['time']==sorted_times[i]] for i in range(len(sorted_times))])
df2 = pd.concat([df1.loc[df1['group']==sorted_groups[i]] for i in range(len(sorted_groups))])
lst_sorted = df2['data'].values
答案 0 :(得分:1)
为您的两个条件创建两个排序键似乎是一个很好的方法:
def multiple_sort(value):
first, second = value.split('-', 1)
# first key is `C1`, C2` etc.
key1 = first.split('_')[-1]
# use this order for second sort key
names = ['3-6s', '6m', '1h', '8h', '24h']
key2 = len(names) # last if not found
for pos, name in enumerate(names):
if second.startswith(name):
key2 = pos
break
return key1, key2
根据您的数据进行测试:
data = ['Vout_C1-3-6sNP-N',
'Vout_C1-6mNP-N',
'Vout_C1-1hNP-N',
'Vout_C1-8hNP-N',
'Vout_C1-24hNP-N',
'Vout_C2-3-6sNP-N',
'Vout_C2-6mNP-N',
'Vout_C2-1hNP-N',
'Vout_C2-8hNP-N',
'Vout_C2-24hNP-N',
'Vout_C3-3-6sNP-N',
'Vout_C3-6mNP-N',
'Vout_C3-1hNP-N',
'Vout_C3-8hNP-N',
'Vout_C3-24hNP-N',
'Vout_T1-3-6sNP-N',
'Vout_T1-6mNP-N',
'Vout_T1-1hNP-N',
'Vout_T1-8hNP-N',
'Vout_T1-24hNP-N',
'Vout_T2-3-6sNP-N',
'Vout_T2-6mNP-N',
'Vout_T2-1hNP-N',
'Vout_T2-8hNP-N',
'Vout_T2-24hNP-N',
'Vout_T3-3-6sNP-N',
'Vout_T3-6mNP-N',
'Vout_T3-1hNP-N',
'Vout_T3-8hNP-N',
'Vout_T3-24hNP-N']
使订单随机:
import random
random.shuffle(data)
查看结果:
import pprint
pprint.pprint(sorted(data, key=multiple_sort))
输出:
['Vout_C1-3-6sNP-N',
'Vout_C1-6mNP-N',
'Vout_C1-1hNP-N',
'Vout_C1-8hNP-N',
'Vout_C1-24hNP-N',
'Vout_C2-3-6sNP-N',
'Vout_C2-6mNP-N',
'Vout_C2-1hNP-N',
'Vout_C2-8hNP-N',
'Vout_C2-24hNP-N',
'Vout_C3-3-6sNP-N',
'Vout_C3-6mNP-N',
'Vout_C3-1hNP-N',
'Vout_C3-8hNP-N',
'Vout_C3-24hNP-N',
'Vout_T1-3-6sNP-N',
'Vout_T1-6mNP-N',
'Vout_T1-1hNP-N',
'Vout_T1-8hNP-N',
'Vout_T1-24hNP-N',
'Vout_T2-3-6sNP-N',
'Vout_T2-6mNP-N',
'Vout_T2-1hNP-N',
'Vout_T2-8hNP-N',
'Vout_T2-24hNP-N',
'Vout_T3-3-6sNP-N',
'Vout_T3-6mNP-N',
'Vout_T3-1hNP-N',
'Vout_T3-8hNP-N',
'Vout_T3-24hNP-N']
查看一些示例字符串:
data[:10]
['Vout_C1-1hNP-N',
'Vout_C2-1hNP-N',
'Vout_C2-8hNP-N',
'Vout_T2-24hNP-N',
'Vout_C1-3-6sNP-N',
'Vout_T3-6mNP-N',
'Vout_C3-24hNP-N',
'Vout_C3-3-6sNP-N',
'Vout_C1-8hNP-N',
'Vout_T2-6mNP-N']
函数multiple_sort()
生成以下值:
[multiple_sort(x) for x in data[:10]]
[('C1', 2),
('C2', 2),
('C2', 3),
('T2', 4),
('C1', 0),
('T3', 1),
('C3', 4),
('C3', 0),
('C1', 3),
('T2', 1)]
现在:
sorted(data, key=multiple_sort)
使用这些键进行排序。即,它首先按第一个键C1
,C2
等排序,如果第二个键2
,2
,3
,{{ 1}}等。
答案 1 :(得分:0)
使用@chrisz回答的正则表达式
import re
from random import shuffle
def customOrderKey(e):
matches = re.findall(r'[A-Z]\d-(\d+-\d+[mhs]|\d+[mhs])', e)
return '' if len(matches) == 0 else matches[0]
x = ['Vout_C1-3-6sNP-N', 'Vout_C1-6mNP-N', 'Vout_C1-1hNP-N', 'Vout_C1-8hNP-N', 'Vout_C1-24NP-N', 'Vout_C2-3-6sNP-N', 'Vout_C2-6mNP-N', 'Vout_C2-1hNP-N', 'Vout_C2-8hNP-N', 'Vout_C2-24NP-N', 'Vout_C3-3-6sNP-N', 'Vout_C3-6mNP-N', 'Vout_C3-1hNP-N', 'Vout_C3-8hNP-N', 'Vout_C3-24NP-N', 'Vout_T1-3-6sNP-N', 'Vout_T1-6mNP-N', 'Vout_T1-1hNP-N', 'Vout_T1-8hNP-N', 'Vout_T1-24NP-N', 'Vout_T2-3-6sNP-N', 'Vout_T2-6mNP-N', 'Vout_T2-1hNP-N', 'Vout_T2-8hNP-N', 'Vout_T2-24NP-N', 'Vout_T3-3-6sNP-N', 'Vout_T3-6mNP-N', 'Vout_T3-1hNP-N', 'Vout_T3-8hNP-N', 'Vout_T3-24NP-N']
shuffle(x) # shuffling just to check the code
order = ['3-6s', '6m', '1h', '8h', '24h', '']
x.sort(key=(lambda x: x[5:])) # sort by A
x.sort(key=(lambda x: order.index(customOrderKey(x)))) # sort by B
注意:Python的排序是稳定的,因此排序的顺序并不重要
答案 2 :(得分:0)
好的,我刚刚去了....
所以我开始生成数据(因为你没有提供足够的数据)
from random import randint
def rnd_3_char():
return chr(65+randint(0,25))+chr(65+randint(0,25))+chr(65+randint(0,25))
def gen_data():
A=['C1','C2','C3','T1','T2', 'T3']
B=['3s','4s','5s','6m','1h','8h','24h']
return "{}_{}_{}".format(rnd_3_char(),A[randint(0,len(A)-1)],B[randint(0,len(B)-1)])
我将其放入名为data ...的列表中
data=[gen_data() for a in range(500)]
十大记录看起来像这样......
[' YTI_T1_5s&#39 ;, ' ZHB_T2_8h&#39 ;, ' RRN_C3_6m&#39 ;, ' VLW_C1_4s&#39 ;, ' AUP_T3_6m&#39 ;, ' OFU_T1_4s&#39 ;, ' XTE_C2_5s&#39 ;, ' VQV_T3_8h&#39 ;, ' NIC_C3_4s&#39 ;, ' RLC_T2_8h']
这似乎符合您的要求。
现在自定义排序...我将数据拆分为3
因此RLC_T2_8h变为RLC T2 8h。
现在使用正则表达式我在内部计算第3个值的秒数,并且第1个值将这些值返回到排序函数。
import re
def my_sort(a):
sec_cnt=0
parts=a.split('_')
match=re.findall('([0-9]+)([shm])',parts[2])
try:
if match[0][1]=='s':
sec_cnt=int(match[0][0])
elif match[0][1]=='m':
sec_cnt=60*int(match[0][0])
elif match[0][1]=='h':
sec_cnt=3600*int(match[0][0])
except:
#print("{}".format(parts[1]))
pass
return parts[1],sec_cnt
所以使用这个
data2=sorted(data,key=my_sort)
data2[:10]
返回
[' BBM_C1_3s&#39 ;, ' TSD_C1_3s&#39 ;, ' YZR_C1_3s&#39 ;, ' HJL_C1_3s&#39 ;, ' TNU_C1_3s&#39 ;, ' LYK_C1_3s&#39 ;, ' MYT_C1_3s&#39 ;, ' FFX_C1_3s&#39 ;, ' XDB_C1_3s&#39 ;, ' BVB_C1_3s&#39 ;, ' LYD_C1_3s&#39 ;, ' NIM_C1_3s&#39 ;, ' NBU_C1_3s',
希望这足够接近