我正在尝试遍历一个字符串列表,只保留那些与我指定的命名模板匹配的字符串。我想接受任何与模板完全匹配的列表条目,除了在变量<SCENARIO>
字段中有一个整数。
支票需要一般。具体来说,字符串结构可能会发生变化,因此无法保证<SCENARIO>
始终显示在字符X上(例如,使用列表推导)。
下面的代码显示了一种使用split
的方法,但必须有更好的方法来进行此字符串比较。我可以在这里使用正则表达式吗?
template = 'name_is_here_<SCENARIO>_20131204.txt'
testList = ['name_is_here_100_20131204.txt', # should accept
'name_is_here_100_20131204.txt.NEW', # should reject
'other_name.txt'] # should reject
acceptList = []
for name in testList:
print name
acceptFlag = True
splitTemplate = template.split('_')
splitName = name.split('_')
# if lengths do not match, name cannot possibly match template
if len(splitTemplate) == len(splitName):
print zip(splitTemplate, splitName)
# compare records in the split
for t, n in zip(splitTemplate, splitName):
if t!=n and not t=='<SCENARIO>':
#reject if any of the "other" fields are not identical
#(would also check that '<SCENARIO>' field is numeric - not shown here)
print 'reject: ' + name
acceptFlag = False
else:
acceptFlag = False
# keep name if it passed checks
if acceptFlag == True:
acceptList.append(name)
print acceptList
# correctly prints --> ['name_is_here_100_20131204.txt']
答案 0 :(得分:3)
尝试使用re
模块在Python中使用正则表达式:
import re
template = re.compile(r'^name_is_here_(\d+)_20131204.txt$')
testList = ['name_is_here_100_20131204.txt', #accepted
'name_is_here_100_20131204.txt.NEW', #rejected!
'name_is_here_aabs2352_20131204.txt', #rejected!
'other_name.txt'] #rejected!
acceptList = [item for item in testList if template.match(item)]
答案 1 :(得分:1)
这应该这样做,我明白name_is_here只是一个字母数字字符的占位符?
import re
testList = ['name_is_here_100_20131204.txt', # should accept
'name_is_here_100_20131204.txt.NEW', # should reject
'other_name.txt',
'name_is_44ere_100_20131204.txt',
'name_is_here_100_2013120499.txt',
'name_is_here_100_something_2013120499.txt',
'name_is_here_100_something_20131204.txt']
def find(scenario):
begin = '[a-z_]+100_' # any combinations of chars and underscores followd by 100
end = '_[0-9]{8}.txt$' #exactly eight digits followed by .txt at the end
pattern = re.compile("".join([begin,scenario,end]))
result = []
for word in testList:
if pattern.match(word):
result.append(word)
return result
find('something') # returns ['name_is_here_100_something_20131204.txt']
编辑:单独变量中的场景,正则表达式现在只匹配后跟100的字符,然后是scenarion,然后是八位数后跟.txt。