我有一个列表如下:
var paths = new List<string> {
@"rootuploaded\samplefolder\1232_234234_1.jpg",
@"rootuploaded\samplefolder\1232_2342.jpg",
@"rootuploaded\samplefolder\subfolder\1232_234234_1.jpg",
@"rootuploaded\samplefolder\subfolder\1232_2342.jpg",
@"rootuploaded\file-5.txt",
@"rootuploaded\file-67.txt",
@"rootuploaded\file-a.txt",
@"rootuploaded\file1.txt",
@"rootuploaded\file5.txt",
@"rootuploaded\filea.txt",
@"rootuploaded\text.txt",
@"rootuploaded\file_sample_a.txt",
@"rootuploaded\file2.txt",
@"rootuploaded\file_sample.txt",
@"rootuploaded\samplefolder\1232_234234_2.bmp",
};
如何打印输出:
○第1组
rootuploaded\samplefolder\1232_234234_1.jpg,
rootuploaded\samplefolder\1232_234234_2.bmp
○第2组
rootuploaded\file1.txt
rootuploaded\file2.txt
rootuploaded\file5.txt
○第3组
rootuploaded\file-5.txt
rootuploaded\file-67.txt
○第4组
rootuploaded\file_sample.txt
rootuploaded\file_sample_a.txt
○无法分组
rootuploaded\samplefolder\1232_2342.jpg
rootuploaded\file-a.txt
rootuploaded\filea.txt
rootuploaded\text.txt
根据6种命名约定(具有自上而下优先级)对文件进行分组:
FileName.ext,FileName_anything.ext,FileName_anythingelse.ext,...
FileName.ext,FileName-anything.ext,FileName-anythingelse.ext,...
FileName_1.ext,FileName_2.ext,...,FileName_N.ext(可能不连续)
FileName-1.ext,FileName-2.ext,...,FileName-N.ext(可能不连续)
FileName 1.ext,FileName 2.ext,...,FileName N.ext(可能不连续)
FileName1.ext,FileName2.ext,...,FileNameN.ext(可能不连续)
我用Linq分开了:
var groups1 = paths.GroupBy(GetFileName, (key, g) => new
{
key = key,
count = g.Count(),
path = g.ToList()
}).Where(x => x.count < 5 && x.count >= 2).ToList();
public string GetFileName(string fileName)
{
var index = 0;
if (fileName.Contains("_"))
index = fileName.IndexOf("_", StringComparison.Ordinal);
else if (fileName.Contains("-"))
index = fileName.IndexOf("-", StringComparison.Ordinal);
var result = fileName.Substring(0, index);
return result;
}
答案 0 :(得分:0)
可悲的是,1。和2.小组很难解决这个问题。因为它们都包含&#39; FileName.ext&#39;,所以它必须一起检查整个列表:(
我尝试分开灌浆1. 2.和3 - 6:
查找并删除第1组和第2组候选人。 它根据文件路径对列表进行排序:
var orderedFilenames = pathsDistinct().OrderBy(p => p).ToList();
找到第1组和第2组候选人:
var groupped = orderedFilenames.GroupBy(s => GetStarterFileName(s, orderedFilenames));
private static string GetStarterFileName(string fileNameMatcher, List<string> orderedFilenames)
{
string fileNameMatcherWOExt = Path.GetFileNameWithoutExtension(fileNameMatcher);
return orderedFilenames.FirstOrDefault(p =>
{
if (p == fileNameMatcher) return true;
string p_directory = Path.GetDirectoryName(p);
string directory = Path.GetDirectoryName(fileNameMatcher);
if (p_directory != directory) return false;
string pure = Path.GetFileNameWithoutExtension(p);
if (!fileNameMatcherWOExt.StartsWith(pure)) return false;
if (fileNameMatcherWOExt.Length <= pure.Length) return false;
char separator = fileNameMatcherWOExt[pure.Length];
if (separator != '_' && separator != '-') return false;
return true;
});
}
在第一步之后,您获得了第1组和第2组候选人,但所有其他人被分成不同的组。
收集剩余路径和separete第1组和第2组:
var mergedGroupps = groupped.Where(grp => grp.Count() == 1).SelectMany(grp => grp);
var starterFileNameGroups = groupped.Where(grp => grp.Count() > 1);
现在你可以根据正则表达式验证找到3-6:
var endWithNumbersGroups = mergedGroupps.GroupBy(s => GetEndWithNumber(s));
private static string GetEndWithNumber(string fileNameMatcher)
{
string fileNameWithoutExtesion = Path.Combine(Path.GetDirectoryName(fileNameMatcher), Path.GetFileNameWithoutExtension(fileNameMatcher));
string filename = null;
filename = CheckWithRegex(@"_(\d+)$", fileNameWithoutExtesion, 1);
if (filename != null) return filename;
filename = CheckWithRegex(@"-(\d+)$", fileNameWithoutExtesion, 1);
if (filename != null) return filename;
filename = CheckWithRegex(@" (\d+)$", fileNameWithoutExtesion, 1);
if (filename != null) return filename;
filename = CheckWithRegex(@"(\d+)$", fileNameWithoutExtesion);
if (filename != null) return filename;
return fileNameWithoutExtesion;
}
private static string CheckWithRegex(string p, string filename, int additionalCharLength = 0)
{
Regex regex = new Regex(p, RegexOptions.Compiled | RegexOptions.CultureInvariant);
Match match = regex.Match(filename);
if (match.Success)
return filename.Substring(0, filename.Length - (match.Groups[0].Length - additionalCharLength));
return null;
}
收集非分组项目并合并1-2组和3-6名候选人
var nonGroupped = endWithNumbersGroups.Where(grp => grp.Count() == 1).SelectMany(grp => grp);
endWithNumbersGroups = endWithNumbersGroups.Where(grp => grp.Count() > 1);
var result = starterFileNameGroups.Concat(endWithNumbersGroups);
您可以尝试一次性解决这两个步骤,但正如您所见,灌浆机制是不同的。我的解决方案并不是那么美好,但我认为它很清楚......也许:)。
答案 1 :(得分:0)
尝试这样做:
var groups = new []
{
new { regex = @"rootuploaded\\samplefolder\\1232_234234_\d\..{3}", grp = 1 },
new { regex = @"rootuploaded\\file\d\.txt", grp = 2 },
new { regex = @"rootuploaded\\file-\d+\.txt", grp = 3 },
new { regex = @"rootuploaded\\file_sample.*\.txt", grp = 4 },
};
var results =
from path in paths
group path by
groups
.Where(x => Regex.IsMatch(path, x.regex))
.Select(x => x.grp)
.DefaultIfEmpty(99)
.First()
into gpaths
orderby gpaths.Key
select new
{
Group = gpaths.Key,
Files = gpaths.ToArray(),
};
这给你这个:
你只需要使用正则表达式,直到你得到你想要的结果。