从列表字符串

时间:2016-09-07 04:02:37

标签: c#

我有一个列表如下:

var paths = new List<string> {                        

@"rootuploaded\samplefolder\1232_234234_1.jpg",
@"rootuploaded\samplefolder\1232_2342.jpg",                                      
@"rootuploaded\samplefolder\subfolder\1232_234234_1.jpg",                        
@"rootuploaded\samplefolder\subfolder\1232_2342.jpg",
@"rootuploaded\file-­5.txt",
@"rootuploaded\file­-67.txt",
@"rootuploaded\file­-a.txt",
@"rootuploaded\file1.txt",
@"rootuploaded\file5.txt",
@"rootuploaded\filea.txt",
@"rootuploaded\text.txt",
@"rootuploaded\file_sample_a.txt",
@"rootuploaded\file2.txt",
@"rootuploaded\file_sample.txt",
@"rootuploaded\samplefolder\1232_234234_2.bmp",
};

如何打印输出:

第1组

rootuploaded\samplefolder\1232_234234_1.jpg,
rootuploaded\samplefolder\1232_234234_2.bmp

第2组

rootuploaded\file1.txt
rootuploaded\file2.txt
rootuploaded\file5.txt

第3组

rootuploaded\file-5.txt
rootuploaded\file-67.txt

第4组

rootuploaded\file_sample.txt
rootuploaded\file_sample_a.txt

无法分组

 rootuploaded\samplefolder\1232_2342.jpg
 rootuploaded\file-a.txt
 rootuploaded\filea.txt
 rootuploaded\text.txt

根据6种命名约定(具有自上而下优先级)对文件进行分组:

  1. FileName.ext,FileName_anything.ext,FileName_anythingelse.ext,...

  2. FileName.ext,FileName-anything.ext,FileName-anythingelse.ext,...

  3. FileName_1.ext,FileName_2.ext,...,FileName_N.ext(可能不连续)

  4. FileName-1.ext,FileName-2.ext,...,FileName-N.ext(可能不连续)

  5. FileName 1.ext,FileName 2.ext,...,FileName N.ext(可能不连续)

  6. FileName1.ext,FileName2.ext,...,FileNameN.ext(可能不连续)

  7. 我用Linq分开了:

            var groups1 = paths.GroupBy(GetFileName, (key, g) => new
            {
                key = key,
                count = g.Count(),
                path = g.ToList()
            }).Where(x => x.count < 5 && x.count >= 2).ToList();
    
    public string GetFileName(string fileName)
            {            
                var index = 0;
    
            if (fileName.Contains("_"))
                index = fileName.IndexOf("_", StringComparison.Ordinal);    
    
            else if (fileName.Contains("-"))
                    index = fileName.IndexOf("-", StringComparison.Ordinal); 
    
            var result = fileName.Substring(0, index);
            return result;
        }
    

2 个答案:

答案 0 :(得分:0)

可悲的是,1。和2.小组很难解决这个问题。因为它们都包含&#39; FileName.ext&#39;,所以它必须一起检查整个列表:(

我尝试分开灌浆1. 2.和3 - 6:

第一步:

查找并删除第1组和第2组候选人。 它根据文件路径对列表进行排序:

var orderedFilenames = pathsDistinct().OrderBy(p => p).ToList();

找到第1组和第2组候选人:

var groupped = orderedFilenames.GroupBy(s => GetStarterFileName(s, orderedFilenames));

private static string GetStarterFileName(string fileNameMatcher, List<string> orderedFilenames)
{
    string fileNameMatcherWOExt = Path.GetFileNameWithoutExtension(fileNameMatcher);
    return orderedFilenames.FirstOrDefault(p =>
        {
            if (p == fileNameMatcher) return true;

            string p_directory = Path.GetDirectoryName(p);
            string directory = Path.GetDirectoryName(fileNameMatcher);
            if (p_directory != directory) return false;

            string pure = Path.GetFileNameWithoutExtension(p);

            if (!fileNameMatcherWOExt.StartsWith(pure)) return false;

            if (fileNameMatcherWOExt.Length <= pure.Length) return false;

            char separator = fileNameMatcherWOExt[pure.Length];
            if (separator != '_' && separator != '-') return false;
            return true;
        });
}

第二步:

在第一步之后,您获得了第1组和第2组候选人,但所有其他人被分成不同的组。

收集剩余路径和separete第1组和第2组:

var mergedGroupps = groupped.Where(grp => grp.Count() == 1).SelectMany(grp => grp);
var starterFileNameGroups = groupped.Where(grp => grp.Count() > 1);

第三步

现在你可以根据正则表达式验证找到3-6:

var endWithNumbersGroups = mergedGroupps.GroupBy(s => GetEndWithNumber(s));
private static string GetEndWithNumber(string fileNameMatcher)
{
    string fileNameWithoutExtesion = Path.Combine(Path.GetDirectoryName(fileNameMatcher), Path.GetFileNameWithoutExtension(fileNameMatcher));

    string filename = null;

    filename = CheckWithRegex(@"_(\d+)$", fileNameWithoutExtesion, 1);
    if (filename != null) return filename;

    filename = CheckWithRegex(@"-(\d+)$", fileNameWithoutExtesion, 1);
    if (filename != null) return filename;

    filename = CheckWithRegex(@" (\d+)$", fileNameWithoutExtesion, 1);
    if (filename != null) return filename;

    filename = CheckWithRegex(@"(\d+)$", fileNameWithoutExtesion);
    if (filename != null) return filename;

    return fileNameWithoutExtesion;
}


private static string CheckWithRegex(string p, string filename, int additionalCharLength = 0)
{
    Regex regex = new Regex(p, RegexOptions.Compiled | RegexOptions.CultureInvariant);
    Match match = regex.Match(filename);
    if (match.Success)
        return filename.Substring(0, filename.Length - (match.Groups[0].Length - additionalCharLength));
    return null;
}

最后一步:

收集非分组项目并合并1-2组和3-6名候选人

var nonGroupped = endWithNumbersGroups.Where(grp => grp.Count() == 1).SelectMany(grp => grp);
endWithNumbersGroups = endWithNumbersGroups.Where(grp => grp.Count() > 1);

var result = starterFileNameGroups.Concat(endWithNumbersGroups);

您可以尝试一次性解决这两个步骤,但正如您所见,灌浆机制是不同的。我的解决方案并不是那么美好,但我认为它很清楚......也许:)。

答案 1 :(得分:0)

尝试这样做:

var groups = new []
{
    new { regex = @"rootuploaded\\samplefolder\\1232_234234_\d\..{3}", grp = 1 },
    new { regex = @"rootuploaded\\file\d\.txt", grp = 2 },
    new { regex = @"rootuploaded\\file-\d+\.txt", grp = 3 },
    new { regex = @"rootuploaded\\file_sample.*\.txt", grp = 4 },
};

var results =
    from path in paths
    group path by
        groups
            .Where(x => Regex.IsMatch(path, x.regex))
            .Select(x => x.grp)
            .DefaultIfEmpty(99)
            .First()
        into gpaths
    orderby gpaths.Key
    select new
    {
        Group = gpaths.Key,
        Files = gpaths.ToArray(),
    };

这给你这个:

results

你只需要使用正则表达式,直到你得到你想要的结果。