如果我有字典:
mydict = {"g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt" : 0,
"g18_84pp_2A_MVP2_GoodiesT0-HKJ-DFG_MIX-CMVP2_Y1000-MIX.txt" : 1,
"g18_84pp_2A_MVP3_GoodiesT0-HKJ-DFG_MIX-CMVP3_Y1000-MIX.txt" : 2,
"g18_84pp_2A_MVP4_GoodiesT0-HKJ-DFG_MIX-CMVP4_Y1000-MIX.txt" : 3,
"g18_84pp_2A_MVP5_GoodiesT0-HKJ-DFG_MIX-CMVP5_Y1000-MIX.txt" : 4,
"g18_84pp_2A_MVP6_GoodiesT0-HKJ-DFG_MIX-CMVP6_Y1000-MIX.txt" : 5,
"h18_84pp_3A_MVP1_GoodiesT1-HKJ-DFG-CMVP1_Y1000-FIX.txt" : 6,
"g18_84pp_2A_MVP7_GoodiesT0-HKJ-DFG_MIX-CMVP7_Y1000-MIX.txt" : 7,
"h18_84pp_3A_MVP2_GoodiesT1-HKJ-DFG-CMVP2_Y1000-FIX.txt" : 8,
"h18_84pp_3A_MVP3_GoodiesT1-HKJ-DFG-CMVP3_Y1000-FIX.txt" : 9,
"p18_84pp_2B_MVP1_GoodiesT2-HKJ-DFG-CMVP3_Y1000-FIX.txt" : 10}
我想在第一个g18_84pp_2A_MVP_GoodiesT0
之前提取公共部分-
。
我还希望在第一组中找到特定字_MIX
时添加g18_84pp_2A_MVP_GoodiesT0
以跟随MIX
。假设我能够根据myDict中的MIX
或FIX
来分类两个组,然后是最终的输出字典:
OutputNameDict= {"g18_84pp_2A_MVP_GoodiesT0_MIX" : 0,
"h18_84pp_3A_MVP_GoodiesT1_FIX" : 1,
"p18_84pp_2B_MVP_FIX": 2}
我可以用任何功能找到共同的部分吗?如何在-
等特定符号之前或之后选择单词并查找MIX
或FIX
等特定字词?
答案 0 :(得分:1)
您可以使用split
获取公共部分:
s = "g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt"
n = s.split('-')[0]
事实上,split
会为您提供由'-'
分隔的每个令牌的列表,因此s.split('-')
会产生:
['g18_84pp_2A_MVP1_GoodiesT0', 'HKJ', 'DFG_MIX', 'CMVP1_Y1000', 'MIX.txt']
要查看字符串中是否有MIX
或FIX
,您可以使用in
:
if 'MIX' in s:
print "then MIX is in the string s"
如果你想摆脱'MVP'
之后的数字,你可以使用re
模块:
import re
s = 'g18_84pp_2A_MVP1_GoodiesT0'
s = re.sub('MVP[0-9]*','MVP',s)
这是一个示例函数,用于获取公共部分的列表:
def foo(mydict):
return [re.sub('MVP[0-9]*', 'MVP', k.split('-')[0]) for k in mydict]
答案 1 :(得分:1)
您可以使用index()
功能查找短划线,然后根据该知识,您可以将剩余的字符串带到该点之后。例如,
mydict = {"g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt" : 0,
"g18_84pp_2A_MVP2_GoodiesT0-HKJ-DFG_MIX-CMVP2_Y1000-MIX.txt" : 1,
"g18_84pp_2A_MVP3_GoodiesT0-HKJ-DFG_MIX-CMVP3_Y1000-MIX.txt" : 2,
"g18_84pp_2A_MVP4_GoodiesT0-HKJ-DFG_MIX-CMVP4_Y1000-MIX.txt" : 3,
"g18_84pp_2A_MVP5_GoodiesT0-HKJ-DFG_MIX-CMVP5_Y1000-MIX.txt" : 4,
"g18_84pp_2A_MVP6_GoodiesT0-HKJ-DFG_MIX-CMVP6_Y1000-MIX.txt" : 5,
"g18_84pp_2A_MVP7_GoodiesT0-HKJ-DFG_MIX-CMVP7_Y1000-MIX.txt" : 6,
"h18_84pp_3A_MVP1_GoodiesT1-HKJ-DFG_MIX-CMVP1_Y1000-FIX.txt" : 7,
"h18_84pp_3A_MVP2_GoodiesT1-HKJ-DFG_MIX-CMVP2_Y1000-FIX.txt" : 8,
"h18_84pp_3A_MVP2_GoodiesT1-HKJ-DFG_MIX-CMVP3_Y1000-FIX.txt" : 9}
for value in sorted(mydict.iterkeys()):
index = value.index('-')
extracted = value[index+1:-4] # Goes past the first occurrence of - and removes .txt from the end
print extracted[-3:] # Find the last 3 letters in the string
将打印以下内容:
MIX
MIX
MIX
MIX
MIX
MIX
MIX
FIX
FIX
FIX
然后,如果语句可以用来做你想做的事。
如果您只想提取公共部分。
index = value.index('-')
extracted = value[:index] # Will get g18_84pp_2A_MVP1_GoodiesT0
然后找出要使用的结尾。如果您知道mydict值的结尾将始终为MIX.txt或FIX.txt,那么您可以这样做。
for value in sorted(mydict.iterkeys()):
ending = value[-7:-4]
index = value.index('-')
extracted = value[:index]
print "%s_%s" % (extracted, ending)
打印
g18_84pp_2A_MVP1_GoodiesT0_MIX
g18_84pp_2A_MVP2_GoodiesT0_MIX
g18_84pp_2A_MVP3_GoodiesT0_MIX
g18_84pp_2A_MVP4_GoodiesT0_MIX
g18_84pp_2A_MVP5_GoodiesT0_MIX
g18_84pp_2A_MVP6_GoodiesT0_MIX
g18_84pp_2A_MVP7_GoodiesT0_MIX
h18_84pp_3A_MVP1_GoodiesT1_FIX
h18_84pp_3A_MVP2_GoodiesT1_FIX
h18_84pp_3A_MVP2_GoodiesT1_FIX
然后将其添加到提取的字典中。
答案 2 :(得分:0)
感谢您的回答。我的完整代码如下。有什么建议可以优化吗?
import re
mydict = {"g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt" : 0,
"g18_84pp_2A_MVP2_GoodiesT0-HKJ-DFG_MIX-CMVP2_Y1000-MIX.txt" : 1,
"g18_84pp_2A_MVP3_GoodiesT0-HKJ-DFG_MIX-CMVP3_Y1000-MIX.txt" : 2,
"g18_84pp_2A_MVP4_GoodiesT0-HKJ-DFG_MIX-CMVP4_Y1000-MIX.txt" : 3,
"g18_84pp_2A_MVP5_GoodiesT0-HKJ-DFG_MIX-CMVP5_Y1000-MIX.txt" : 4,
"g18_84pp_2A_MVP6_GoodiesT0-HKJ-DFG_MIX-CMVP6_Y1000-MIX.txt" : 5,
"h18_84pp_3A_MVP1_GoodiesT1-HKJ-DFG-CMVP1_Y1000-FIX.txt" : 6,
"g18_84pp_2A_MVP7_GoodiesT0-HKJ-DFG_MIX-CMVP7_Y1000-MIX.txt" : 7,
"h18_84pp_3A_MVP2_GoodiesT1-HKJ-DFG-CMVP2_Y1000-FIX.txt" : 8,
"h18_84pp_3A_MVP3_GoodiesT1-HKJ-DFG-CMVP3_Y1000-FIX.txt" : 9,
"p18_84pp_2B_MVP1_GoodiesT2-HKJ-DFG-CMVP3_Y1000-FIX.txt" : 10}
ExtractDict = {}
start = 0
for stringList in sorted(mydict.iterkeys()):
stringList = stringList.split('.')[0]
underscore = stringList.split('_')
Area= re.split('[0-9]+',stringList.split('_')[3])[0] # MVP and etc.
CaseNameString=underscore[0]+"_"+underscore[1]+"_"+underscore[2]+"_"+Area #g18_84pp_2A_MVP_GoodiesT0 and etc.
postfix= stringList.split('-')[4]
Newstring= CaseNameString + "_" + postfix
ExtractDict[Newstring]= start
start += 1
startagain =0
OutputNameDict = {}
for OutputNameList in sorted(ExtractDict.iterkeys()):
OutputNameDict[OutputNameList] = startagain
startagain +=1
#OutputNameDict = {'h18_84pp_3A_MVP_FIX': 1, 'p18_84pp_2B_MVP_FIX': 2, 'g18_84pp_2A_MVP_MIX': 0}