匹配包含给定子字符串列表的文件名

时间:2018-07-19 17:37:18

标签: python

我正在编写一个模块,该模块将在其他命令行参数中包含字符串数组。该数组将类似于:

['PUPSFF', 'PCASPE', 'PCASEN']

我的模块有一种方法,可以在目录中搜索与可能格式匹配的文件:

def search(self, fundCode, type):
    funds_string = '_'.join(fundCode)
    files = set(os.listdir(self.unmappedDir))
    file_match = 'citco_unmapped_{type}_{funds}_{start}_{end}.csv'.format(type=type, funds=funds_string, start=self.startDate, end=self.endDate)
    if file_match in files:
        filename = os.path.join(self.unmappedDir, file_match)
        return self.read_file(filename)
    else:
        Logger.error('No {type} file/s found for {funds}, between {start} and {end}'.format(type=type, funds=fundCode, start=self.startDate, end=self.endDate))

因此,如果我的目录中有一个像这样的文件:

citco_unmapped_positions_PUPSFF_PCASPE_PCASEN_2018-07-01_2018-07-11.csv

然后我将此数组作为cmd行参数传递:['PUPSFF', 'PCASPE', 'PCASEN']

在调用我的方法(并传递其余self参数)后,如下所示:

positions = alerter.search(alerter.fundCodes, 'positions')

它将搜索,找到该文件并执行所需的任何操作。

但是,我希望它与订单无关。因此,如果这样编写命令行参数,它仍会找到文件:

['PCASPE', 'PCASEN', 'PUPSFF']['PCASEN', 'PUPSFF', 'PCASPE']或其他

关于如何进行此操作的任何想法?

2 个答案:

答案 0 :(得分:0)

使用all函数查看文件名中每个所需的标签。这个例子可以帮助您:

files = [
    "citco_unmapped_positions_PUPSFF_PCASPE_PCASEN_2018-07-01_2018-07-11.csv",   # yes
    "citco_unmapped_positions_PUPSFF_NO_WAY_PCASEN_2018-07-01_2018-07-11.csv",   # no
    "citco_unmapped_positions_PCASEN_PCASEN_PUPSFF_2018-07-01_2018-07-11.csv",   # no
    "citco_unmapped_positions_PCASPE_PCASEN_PUPSFF_2018-07-01_2018-07-11.csv",   # yes
]

tags = ['PUPSFF', 'PCASPE', 'PCASEN']
for fname in files:
    if (all(tag in fname for tag in tags)):
        # the file is a match.
        print("Match", fname)

输出:

Match citco_unmapped_positions_PUPSFF_PCASPE_PCASEN_2018-07-01_2018-07-11.csv
Match citco_unmapped_positions_PCASPE_PCASEN_PUPSFF_2018-07-01_2018-07-11.csv

答案 1 :(得分:0)

使用permutations from itertools

找到了可能的解决方案
def search(self, fundCodes, type):
    permutations = self.find_permutations(fundCodes)

    files = set(os.listdir(self.unmappedDir))
    for perm in permutations:
        fund_codes = '_'.join(perm)
        file_match = 'citco_unmapped_{type}_{funds}_{start}_{end}.csv'.format(type=type, funds=fund_codes, start=self.startDate, end=self.endDate)
        if file_match in files:
            filename = os.path.join(self.unmappedDir, file_match)
            return self.read_file(filename)
        else:
            Logger.error('No {type} file/s found for {funds}, between {start} and {end}'.format(type=type, funds=fund_codes, start=self.startDate, end=self.endDate))

def find_permutations(self, list):
    perms = [p for p in permutations(list)]
    return perms

不过可能真的很慢。