Question

好吧，所以我正在编写一个模块，该模块将包含一些命令行参数，其中一个参数：fundCodes将是一组资金：['PUSFF', 'AGE', 'AIR']

我的模块必须搜索目录中的文件，并查找与某种格式匹配的文件：

def file_match(self, fundCodes):
    # Get a list of the files
    files = set(os.listdir(self.unmappedDir))

    # loop through all the files and search for matching file
    for check_fund in fundCodes:
        # set a file pattern
        file_match = 'unmapped_positions_{fund}_{start}_{end}.csv'.format(fund=check_fund, start=self.startDate, end=self.endDate)

        # Yet to be used...
        file_trade_match = 'unmapped_trades_{fund}_{start}_{end}.csv'.format(fund=check_fund, start=self.startDate, end=self.endDate)

        # look in the unmappeddir and see if there's a file with that name
        if file_match in files:

            # if there's a match, load unmapped positions as etl
            filename = os.path.join(self.unmappedDir, file_match)
            return self.read_file(filename)
        else:
            Logger.error('No file found with those dates/funds')

我正在尝试找出在目录中搜索两种不同格式的最佳方法。

格式示例为：

unmapped_trades_AGE_2018-07-01_2018-07-11.csv和
unmapped_positions_AGE_2018-07-01_2018-07-11.csv

我想我只需要将每个匹配项分配给一个变量，然后在上一次迭代中检查是否有一个文件等于任一值对吗？虽然似乎多余。还有其他建议吗？

Answer 1

只需执行两个join测试。如果您需要两个文件都存在，则可以使用in：

and

如果您只想处理其中一个文件，则可以执行以下操作：

if file_match in files and file_trade_match in files:
    # do something
else:
    # log error

Answer 2

我会为此使用正则表达式，例如

import re
import os

search_pattern = 'unmapped_{}_([\w]+)_([0-9\-]+)_([0-9\-]+).csv'
data_types = ['trades', 'positions']
pattern_dict = {data_type: search_pattern.format(data_type) for data_type in data_types}

def find_matching_files(search_dir, fund_codes):
    if not os.path.isdir(search_dir):
        raise ValueError('search_dir does not specify a directory')
    search_files = os.listdir(search_dir)
    matching_files = {data_type: [] for data_type in pattern_dict}
    for fname in search_files:
        for data_type, pattern in pattern_dict.items():
            m = re.match(pattern, fname)
            if m is not None and m.group(1) in fund_codes:
                matching_files[data_type].append(fname)
    return matching_files

print(find_matching_files('file_location/', ['PUSFF', 'AGE', 'AIR']))

其中file_location/是要搜索的目录，并且返回将匹配文件分为数据类型的字典

循环浏览目录以搜索两个可能的匹配文件

2 个答案: