好吧,所以我正在编写一个模块,该模块将包含一些命令行参数,其中一个参数:fundCodes
将是一组资金:['PUSFF', 'AGE', 'AIR']
我的模块必须搜索目录中的文件,并查找与某种格式匹配的文件:
def file_match(self, fundCodes):
# Get a list of the files
files = set(os.listdir(self.unmappedDir))
# loop through all the files and search for matching file
for check_fund in fundCodes:
# set a file pattern
file_match = 'unmapped_positions_{fund}_{start}_{end}.csv'.format(fund=check_fund, start=self.startDate, end=self.endDate)
# Yet to be used...
file_trade_match = 'unmapped_trades_{fund}_{start}_{end}.csv'.format(fund=check_fund, start=self.startDate, end=self.endDate)
# look in the unmappeddir and see if there's a file with that name
if file_match in files:
# if there's a match, load unmapped positions as etl
filename = os.path.join(self.unmappedDir, file_match)
return self.read_file(filename)
else:
Logger.error('No file found with those dates/funds')
我正在尝试找出在目录中搜索两种不同格式的最佳方法。
格式示例为:
unmapped_trades_AGE_2018-07-01_2018-07-11.csv
和
unmapped_positions_AGE_2018-07-01_2018-07-11.csv
我想我只需要将每个匹配项分配给一个变量,然后在上一次迭代中检查是否有一个文件等于任一值对吗?虽然似乎多余。还有其他建议吗?
答案 0 :(得分:0)
只需执行两个join
测试。如果您需要两个文件都存在,则可以使用in
:
and
如果您只想处理其中一个文件,则可以执行以下操作:
if file_match in files and file_trade_match in files:
# do something
else:
# log error
答案 1 :(得分:0)
我会为此使用正则表达式,例如
import re
import os
search_pattern = 'unmapped_{}_([\w]+)_([0-9\-]+)_([0-9\-]+).csv'
data_types = ['trades', 'positions']
pattern_dict = {data_type: search_pattern.format(data_type) for data_type in data_types}
def find_matching_files(search_dir, fund_codes):
if not os.path.isdir(search_dir):
raise ValueError('search_dir does not specify a directory')
search_files = os.listdir(search_dir)
matching_files = {data_type: [] for data_type in pattern_dict}
for fname in search_files:
for data_type, pattern in pattern_dict.items():
m = re.match(pattern, fname)
if m is not None and m.group(1) in fund_codes:
matching_files[data_type].append(fname)
return matching_files
print(find_matching_files('file_location/', ['PUSFF', 'AGE', 'AIR']))
其中file_location/
是要搜索的目录,并且返回将匹配文件分为数据类型的字典