文本增量确定(分层)

时间:2016-06-02 16:27:36

标签: c# algorithm comparison diff

我有一个案例需要比较(和差异确定)两个文本。

这些实际上是分层配置文件,所有子项都在每个级别缩进。示例:

文件1:

conf_1
    conf_1_1
    conf_1_2
    conf_1_3
        conf_1_3_1
    conf_1_4
        conf_1_4_1
        conf_1_4_2

文件2:

conf_1
    conf_1_1
    conf_1_2
    conf_1_3
        conf_1_3_1
        conf_1_3_2
    conf_1_4
        conf_1_4_1
        conf_1_4_2
    conf_1_5

这两个文件之间的比较应该是:

结果:

conf_1
    conf_1_3
        conf_1_3_2
    conf_1_5

备注:

  • 我只对plus delta(第二个文件中的添加内容)感兴趣。
  • 两个文件之间的行顺序可能会发生变化,这不应该是 只要保留层次结构,就会将其解释为差异。

我有一个解决方案:

“展平”每个文件的行(例如conf_1&gt; conf_1_3&gt; conf_1_3_1),执行强力比较(将File1中的每一行与File2中的每一行进行比较),然后重新缩进不同的行。< / p>

但我正在寻找更有效的解决方案。

有什么想法吗?

提前致谢。

1 个答案:

答案 0 :(得分:0)

我建议填充2个分层列表并递归处理它们。

从定义一个简单的类开始:

class Node
{
    public string Text;
    public List<Node> Children;
}

此处Text应该包含删除缩进的文本。

然后,您将从文件中填充两个节点列表,构建具有差异的另一个节点列表,并将结果写入另一个文件。像这样:

var nodes1 = ReadNodes(sourceFile1);
var nodes2 = ReadNodes(sourceFile2);
var diff = GetDiff(nodes1, nodes2);
if (diff.Count > 0)
{
    using (var sw = new StreamWriter(diffFile))
        WriteDiff(sw, diff);
}

使用的方法是:

static List<Node> ReadNodes(string fileName)
{
    // I'm leaving that part for you
}

static List<Node> GetDiff(List<Node> nodes1, List<Node> nodes2)
{
    if (nodes2 == null || nodes2.Count == 0) return null;
    if (nodes1 == null || nodes1.Count == 0) return nodes2;
    var map = nodes1.ToDictionary(n => n.Text);
    var diff = new List<Node>();
    foreach (var n2 in nodes2)
    {
        Node n1;
        if (!map.TryGetValue(n2.Text, out n1))
            diff.Add(n2);
        else
        {
            var childDiff = GetDiff(n1.Children, n2.Children);
            if (childDiff != null && childDiff.Count > 0)
                diff.Add(new Node { Text = n2.Text, Children = childDiff });
        }
    }
    return diff;
}


static void WriteDiff(TextWriter output, List<Node> nodes, int indent = 0)
{
    if (nodes == null) return;
    foreach (var node in nodes)
    {
        for (int i = 0; i < indent; i++)
            output.Write(' ');
        output.WriteLine(node.Text);
        WriteDiff(output, node.Children, indent + 4);
    }
}

使用您的示例进行测试:

var nodes1 = new List<Node>
{
    new Node { Text = "conf_1", Children = new List<Node> {
        new Node { Text = "conf_1_1" },
        new Node { Text = "conf_1_2" },
        new Node { Text = "conf_1_3", Children = new List<Node> {
            new Node { Text = "conf_1_3_1" },
        } },
        new Node { Text = "conf_1_4", Children = new List<Node> {
            new Node { Text = "conf_1_4_1" },
            new Node { Text = "conf_1_4_2" },
        } },
    }},
};
var nodes2 = new List<Node>
{
    new Node { Text = "conf_1", Children = new List<Node> {
        new Node { Text = "conf_1_1" },
        new Node { Text = "conf_1_2" },
        new Node { Text = "conf_1_3", Children = new List<Node> {
            new Node { Text = "conf_1_3_1" },
            new Node { Text = "conf_1_3_2" },
        } },
        new Node { Text = "conf_1_4", Children = new List<Node> {
            new Node { Text = "conf_1_4_1" },
            new Node { Text = "conf_1_4_2" },
        } },
        new Node { Text = "conf_1_5" },
    }},
};
var diff = GetDiff(nodes1, nodes2);
if (diff.Count > 0)
{
    using (var sw = new StringWriter())
    {
        WriteDiff(sw, diff);
        Console.WriteLine(sw.ToString());
    }
}