Python测试字符串是否与模板值匹配

时间:2013-12-09 18:16:46

标签: python regex string

我正在尝试遍历一个字符串列表,只保留那些与我指定的命名模板匹配的字符串。我想接受任何与模板完全匹配的列表条目,除了在变量<SCENARIO>字段中有一个整数。

支票需要一般。具体来说,字符串结构可能会发生变化,因此无法保证<SCENARIO>始终显示在字符X上(例如,使用列表推导)。

下面的代码显示了一种使用split的方法,但必须有更好的方法来进行此字符串比较。我可以在这里使用正则表达式吗?

template = 'name_is_here_<SCENARIO>_20131204.txt'

testList = ['name_is_here_100_20131204.txt',        # should accept
            'name_is_here_100_20131204.txt.NEW',    # should reject
            'other_name.txt']                       # should reject

acceptList = []

for name in testList:
    print name
    acceptFlag = True
    splitTemplate = template.split('_')
    splitName = name.split('_')
    # if lengths do not match, name cannot possibly match template
    if len(splitTemplate) == len(splitName):
        print zip(splitTemplate, splitName)
        # compare records in the split
        for t, n in zip(splitTemplate, splitName):
            if t!=n and not t=='<SCENARIO>':
                #reject if any of the "other" fields are not identical
                #(would also check that '<SCENARIO>' field is numeric - not shown here)
                print 'reject: ' + name
                acceptFlag = False
    else:
        acceptFlag = False

    # keep name if it passed checks
    if acceptFlag == True:
        acceptList.append(name)

print acceptList
# correctly prints --> ['name_is_here_100_20131204.txt']

2 个答案:

答案 0 :(得分:3)

尝试使用re模块在​​Python中使用正则表达式:

import re

template = re.compile(r'^name_is_here_(\d+)_20131204.txt$')

testList = ['name_is_here_100_20131204.txt', #accepted
            'name_is_here_100_20131204.txt.NEW', #rejected!
            'name_is_here_aabs2352_20131204.txt', #rejected!
            'other_name.txt'] #rejected!

acceptList = [item for item in testList if template.match(item)]

答案 1 :(得分:1)

这应该这样做,我明白name_is_here只是一个字母数字字符的占位符?

import re
testList = ['name_is_here_100_20131204.txt',        # should accept
            'name_is_here_100_20131204.txt.NEW',    # should reject
            'other_name.txt', 
            'name_is_44ere_100_20131204.txt',
            'name_is_here_100_2013120499.txt', 
            'name_is_here_100_something_2013120499.txt',
            'name_is_here_100_something_20131204.txt']  


def find(scenario):
    begin  = '[a-z_]+100_' # any combinations of chars and underscores followd by 100
    end = '_[0-9]{8}.txt$' #exactly eight digits followed by .txt at the end
    pattern = re.compile("".join([begin,scenario,end]))
    result = []
    for word in testList:
        if pattern.match(word):
            result.append(word)

    return result

find('something') # returns ['name_is_here_100_something_20131204.txt']

编辑:单独变量中的场景,正则表达式现在只匹配后跟100的字符,然后是scenarion,然后是八位数后跟.txt。