Question

我有一个包含数万个文件的文件夹。文件夹中的每个文件都应该有一对匹配，但最初的几个字母除外，例如：

X_Date_Time_Place.dat
Y_Date_Time_Place.dat

每个X_*和Y_*合并为一对文件。

但是，总有几千个文件需要从文件夹中删除。额外的文件也是相同类型但没有对。例如，可能会有更多'X_Date_Time_Place.dat'然后'Y_Date_Time_Place.dat'。只有文件名中的变量为'Date'，'Time'和'Place'。

我编写了一个简单的脚本（使用for循环），它获取一个文件的名称并检查循环中的所有其他文件，直到找到匹配为止。然而，找到一对需要花费大量的时间。

有更快更有效的方法吗？

Answer 1

您可以拆分为两个列表：

xlist = dir( fullfile( path_to_folder, 'X_*.dat') );
ylist = dir( fullfile( path_to_folder, 'Y_*.dat') );
%// remove prefixes
xlist = cellfun(@(x) x(3:end), {xlist.name}, 'uni', false);
ylist = cellfun(@(y) y(3:end), {ylist.name}, 'uni', false);
common = intersect(xlist, ylist);

使用intersect查找常见后缀会让common保留所有Date_Time_Place.dat BOTH X_Date_Time_Place.dat和{{1} }}

获得所有配对：

Y_Date_Time_Place.dat

Answer 2

您可以使用函数dir并指定您希望文件名包含的字符串和/或扩展名：

在你的例子中：

I=dir('* _Date_Time_Place *.dat')

将返回struct I，其字段将是包含字符串*_Date_Time_Place*且扩展名为.dat的所有文件名。

然后，您可以通过调用I(1)，I(2)来访问结构中的元素。

次要说明：

要使其正常工作，您当前的文件夹必须是文件所在的文件夹。

Answer 3

好吧，我没有像这样格式化的10,000个文件，但这就是我要做的。

Xfiles = dir('X*.dat');
filenames = {Xfiles.name};
% Here I would determine how many pairs I am looking for (the unique X's)
% I am assuming that your X files are unique.
% remove the "X" from the file name
filenames2 = cellfun(@(x) repexprep(x, 'X',''));
keys = filenames2;
values = 1:length(filenames2);
fileMap = containers.Map(keys, values);
% for each Y look for the filename
Yfiles = dir('Y*.dat');
Yfiles2 = cellfun(@(x) repexprep(x, 'Y',''));
pairs = cell(length(Yfiles2),2);
% this assumes that for every Y there must be an X
% if this is not true then handle the empty idx case.
for x = 1:length(Yfiles2)
    idx = fileMap(Yfiles2{x});
    pairs(x,:) = {Xfiles(idx), Yfiles(idx)};
end

查找文件夹中的一对文件

3 个答案:

次要说明：