使用LINQ构建递归层次结构

时间:2018-05-03 18:17:33

标签: c# linq

在过去的几个小时里,我一直在与一个看似简单的问题作斗争。我知道解决方案将使用LINQ和递归,但我无法到达那里。

我的示例代码在下面,就像所需的输出一样(类似于它,我真的不在乎,它正在构建正确的层次结构,这是基础)。

任何帮助都会有所帮助。

using System;
using System.Collections.Generic;

namespace ConsoleApp14
{
    class Program
    {
        static string DoSomething(KeyValuePair<string, string>[] dir)
        {
            return ""; //something here
        }


        static void Main(string[] args)
        {
            KeyValuePair<string, string>[] dir = new[]
            {
                new KeyValuePair<string, string>(@"c:\one\two\three","100.txt"),
                new KeyValuePair<string, string>(@"c:\one\two\three","101.txt"),
                new KeyValuePair<string, string>(@"c:\one\four\five","501.txt"),
                new KeyValuePair<string, string>(@"c:\one\six\","6.txt"),
                new KeyValuePair<string, string>(@"c:\one\six","7.txt"),
                new KeyValuePair<string, string>(@"c:\one\","1.txt"),
                new KeyValuePair<string, string>(@"c:\one\six\","8.txt"),
                new KeyValuePair<string, string>(@"c:\one\two","2.txt"),
                new KeyValuePair<string, string>(@"c:\one\two\three\four\five\six\seven","y.txt"),
                new KeyValuePair<string, string>(@"c:\one\xxx","xxx.txt")
            };

            // this is the output I want, rough indentation and crlf, the ordering is not important, just the hierarchy
            Console.WriteLine(DoSomething(dir));
            //
            //  one
            //  (1.txt)
            //      two
            //      (2.txt)
            //           three
            //           (100.txt)
            //           (101.txt)
            //               four
            //                    five
            //                        six
            //                             seven
            //                             (y.txt)
            //      four
            //           five
            //           (501.txt)
            //      six
            //      (6.txt)
            //      (7.txt)
            //      (8.txt)
            //      xxx
            //      (xxx.txt)
            //
        } 
    }
}

3 个答案:

答案 0 :(得分:5)

这是一个数据结构问题,而不是算法问题。一旦拥有了正确的数据结构,算法就会很简单。

您需要的数据结构是:节点是文件或目录:

abstract class Node {}
sealed class Directory : Node {}
sealed class File : Node {}

好的,我们对节点了解多少?只是它有一个名字:

abstract class Node 
{
  public string Name { get; private set; }
  public Node(string name) { this.Name = name; }
}

我们对文件了解多少?只是它有一个名字。

sealed class File : Node
{
  public File(string name) : base(name) { }
}

我们对目录了解多少?它有一个名称和一个子节点列表:

sealed class Directory : Node
{
  public Directory(string name) : base(name) { }
  public List<Node> Children { get; } = new List<Node>();

我们希望能够添加一个孩子:

  public File WithFile(string name)
  {
    // TODO: Is there already a child that has this name? return it.
    // TODO: Otherwise add it
  }
  public Directory WithDirectory(string name) 
  // TODO: Same.

太好了,现在我们可以拿一个目录并添加一个子目录或文件;如果一个已经存在,我们会收回它。

现在,您的具体问题是什么?您有目录名序列文件名,并且您希望将该文件添加到目录中。所以写下来!

  public Directory WithDirectory(IEnumerable<string> directories)
  {
    Directory current = this;
    foreach(string d in directories)
      current = current.WithDirectory(d);
    return current;
  }

  public File WithFile(IEnumerable<string> directories, string name)
  {
    return this.WithDirectory(directories).WithFile(name);
  }

现在你所要做的就是将每条路径分解为一系列名称。所以你的算法是

    Directory root = new Directory("c:");
    foreach(KeyValuePair pair in dir) 
    {
        IEnumerable<string> dirs = TODO break up the key into a sequence of strings
        root.WithFile(dirs, pair.Value);
    }

当您完成后,您将拥有代表您的树的数据结构

现在你有了一棵树,在Node上写一个方法:

override string ToString() => this.ToString(0);
string ToString(int indent) 
// TODO can you implement this?

这里的关键是正确的数据结构。目录只是一个名称加上子目录和文件列表,因此编写该代码。一旦你拥有正确的数据结构,其余的就会自然而然地发生。请注意,我写的每个方法只有几行。 (我作为TODO留下的那些同样非常小。实施它们。)这就是你想要的:在每种方法中做一件事并做得非常好。如果你发现你正在编写巨大的复杂方法,那就停下来并将其重构为许多小方法,每个方法都做一件清楚的事情。

练习:实现一个名为ToBoxyString的ToString版本,它产生:

c:
└─one
  ├─(1.txt)
  ├─two
  │ ├─(2.txt)
  │ └─three

......等等。它并不像它看起来那么难;它只是一个发烧友的缩进。你能弄清楚这种模式吗?

答案 1 :(得分:1)

使用我喜欢的一些实用程序扩展方法:

public static class Ext {
    public static ArraySegment<T> Slice<T>(this T[] src, int start, int? count = null) => (count.HasValue ? new ArraySegment<T>(src, start, count.Value) : new ArraySegment<T>(src, start, src.Length - start));
    public static string Join(this IEnumerable<string> strings, string sep) => String.Join(sep, strings.ToArray());
    public static string Join(this IEnumerable<string> strings, char sep) => String.Join(sep.ToString(), strings.ToArray());
    public static string Repeat(this char ch, int n) => new String(ch, n);
}

您可以使用LINQ以数字方式处理路径,这不需要任何递归,但不是非常有效(它会对整个树中的每个深度遍历原始数组两次)。代码看起来很长,但主要是因为我提出了很多意见。

static IEnumerable<string> DoSomething(KeyValuePair<string, string>[] dir) {
    char[] PathSeparators = new[] { Path.DirectorySeparatorChar, Path.AltDirectorySeparatorChar };
    // some local utility functions
    // split a path into an array of its components [drive,dir1,dir2,...]
    string[] PathComponents(string p) => p.Split(PathSeparators, StringSplitOptions.RemoveEmptyEntries);
    // Combine path components back into a canonical path
    string PathCombine(IEnumerable<string> p) => p.Join(Path.DirectorySeparatorChar);
    // return all distinct paths that are depth deep, truncating longer paths
    IEnumerable<string> PathsAtDepth(IEnumerable<(string Path, string[] Components, string Filename)> dirs, int depth)
        => dirs.Select(pftuple => pftuple.Components)
               .Where(pa => pa.Length > depth)
               .Select(pa => PathCombine(pa.Slice(0, depth + 1)))
               .Distinct();

    // split path into components clean up trailing PathSeparators
    var cleanDirs = dir.Select(kvp => (Path: kvp.Key.TrimEnd(PathSeparators), Components: PathComponents(kvp.Key), Filename: kvp.Value));
    // find the longest path
    var maxDepth = cleanDirs.Select(pftuple => pftuple.Components.Length).Max();
    // ignoring drive, gather all paths at each length and the files beneath them
    var pfl = Enumerable.Range(1, maxDepth)
                        .SelectMany(d => PathsAtDepth(cleanDirs, d) // get paths down to depth d
                             .Select(pathAtDepth => new {
                                Depth = d,
                                Path = pathAtDepth,
                                // gather all filenames for pathAtDepth d
                                Files = cleanDirs.Where(pftuple => pftuple.Path == pathAtDepth)
                                                 .Select(pftuple => pftuple.Filename)
                                                 .ToList()
                            }))
                            .OrderBy(dpef => dpef.Path); // sort into tree
    // convert each entry into its directory path end followed by all files beneath that directory
    var stringTree = pfl.SelectMany(dpf => dpf.Files.Select(f => ' '.Repeat(4 * (dpf.Depth - 1)) + $"({f})")
                                                    .Prepend(' '.Repeat(4 * (dpf.Depth - 1)) + Path.GetFileName(dpf.Path)));

    return stringTree;
}

我的DoSomething版本会返回IEnumerable<string>,如果您愿意,可以Join返回输出中的单个字符串:

Console.WriteLine(DoSomething(dir).Join(Environment.NewLine));

答案 2 :(得分:0)

由于我的第一次尝试时间太长,我决定将此替代方案添加为单独的答案。这个版本在通过dir数组时效率更高。

像以前一样使用一些扩展方法:

public static class Ext {
    public static ArraySegment<T> Slice<T>(this T[] src, int start, int? count = null) => (count.HasValue ? new ArraySegment<T>(src, start, count.Value) : new ArraySegment<T>(src, start, src.Length - start));
    public static string Join(this IEnumerable<string> strings, string sep) => String.Join(sep, strings.ToArray());
    public static string Join(this IEnumerable<string> strings, char sep) => String.Join(sep.ToString(), strings.ToArray());
    public static string Repeat(this char ch, int n) => new String(ch, n);
}

我将dir数组处理成Lookup,它收集每个路径下的所有文件。然后我可以将路径排序到树中,并为每个路径添加路径和它下面的文件。对于路径的每个子集,如果在转换为字符串树时它没有包含文件,我会添加一个空路径条目。

static IEnumerable<string> DoSomething(KeyValuePair<string, string>[] dir) {
    char[] PathSeparators = new[] { Path.DirectorySeparatorChar, Path.AltDirectorySeparatorChar };
    // some local utility functions
    int PathDepth(string p) => p.Count(ch => PathSeparators.Contains(ch));
    string PathToDepth(string p, int d) => p.Split(PathSeparators).Slice(0, d+1).Join(Path.DirectorySeparatorChar);

    // gather distinct paths (without trailing separators) and the files beneath them
    var pathsWithFiles = dir.ToLookup(d => d.Key.TrimEnd(PathSeparators), d => d.Value);
    // order paths with files into tree
    var pfl = pathsWithFiles.Select(pfs => new {
                                Path = pfs.Key, // each path
                                Files = pfs.ToList() // the files beneath it
                            })
                            .OrderBy(dpef => dpef.Path); // sort into tree
    // convert each entry into its directory path end followed by all files beneath that directory
    // add entries for each directory that has no files
    var stringTree = pfl.SelectMany(pf => Enumerable.Range(1, PathDepth(pf.Path))
                                                    // find directories without files
                                                    .Where(d => !pathsWithFiles.Contains(PathToDepth(pf.Path, d)))
                                                    // and add an entry for them
                                                    .Select(d => ' '.Repeat(4 * (d-1)) + Path.GetFileName(PathToDepth(pf.Path, d)))
                                                    // then add all the files
                                                    .Concat(pf.Files.Select(f => ' '.Repeat(4 * (PathDepth(pf.Path)- 1)) + $"({f})")
                                                    // and put the top dir first
                                                    .Prepend(' '.Repeat(4 * (PathDepth(pf.Path)-1)) + Path.GetFileName(pf.Path)))
                                                  );

    return stringTree;
}

你再次可以像以前一样调用它:

Console.WriteLine(DoSomething(dir).Join(Environment.NewLine));