我有差异,我后期处理,并希望扁平化相等的线条。这是一个例子:
Foo
-Bar
+Bar
Baz
我想压下相等的线条,这样它们就不再出现在差异中了。
非常简单-(.*)\n\+\1\n
当我有多行匹配时,问题就出现了:
-Foo
-Bar
+Foo
+Bar
有什么想法吗?或者我不应该做一个RegEx并编写一个简单的解析器?或者一个已经存在?
如果有更好的解决方案,有些背景故事。我正在分析两个文件,看看它们是否相同。可悲的是,输出几乎相同但需要一些后处理,例如
-on line %d
+on line 8
所以我要经历并将已知字符串转换为其他已知字符串,然后我试图检查差异是否为空或仍然不同。
答案 0 :(得分:0)
之前我已经对diff
输出进行了一些简单的分析,所以我有一个Perl脚本给了我一个开始的基础。请考虑以下两个数据文件file.1
和file.2
。
Data
Foo
Bar 1
Baz
I want to squash the lines down that are equal so they don't show up in the diff anymore. This is pretty simple with
-(.*)\n\+\1\n
The problems start when I have multi-line matches like:
Foo 2
Bar 2
Etc.
Data
Foo
Bar 10
Baz
I want to squash the lines down that are equal so they don't show up in the diff anymore. This is pretty simple with
-(.*)\n\+\1\n
The problems start when I have multi-line matches like:
Foo 20
Bar 20
Etc.
原始统一diff
输出为:
--- file.1 2013-03-30 18:58:35.000000000 -0700
+++ file.2 2013-03-30 18:58:48.000000000 -0700
@@ -1,7 +1,7 @@
Data
Foo
-Bar 1
+Bar 10
Baz
I want to squash the lines down that are equal so they don't show up in the diff anymore. This is pretty simple with
@@ -10,7 +10,7 @@
The problems start when I have multi-line matches like:
-Foo 2
-Bar 2
+Foo 20
+Bar 20
Etc.
现在,经过后处理后,所有数字字符串都已替换为##
,因此后处理文件如下所示:
--- file.1 2013-03-30 18:58:35.000000000 -0700
+++ file.2 2013-03-30 18:58:48.000000000 -0700
@@ -1,7 +1,7 @@
Data
Foo
-Bar ##
+Bar ##
Baz
I want to squash the lines down that are equal so they don't show up in the diff anymore. This is pretty simple with
@@ -10,7 +10,7 @@
The problems start when I have multi-line matches like:
-Foo ##
-Bar ##
+Foo ##
+Bar ##
Etc.
这是程序的输入,用于分析差异是否仍然存在。
为了真正有用,我们必须隔离标题行(---
和+++
)并保留它们。对于从@@
开始的每个差异块,我们需要捕获-
和+
行的相邻行,并且:
+
和-
-
行的内容是否与+
行的内容相同。-
部分中拥有多个+
和@@
行的块。@@
块中没有剩余差异,则可以丢弃整个块。冲洗并重复。
我选择的编程语言是Perl。
#!/usr/bin/env perl
use strict;
use warnings;
use constant debug => 0;
my $file1;
my $file2;
my $header = 0;
OUTER:
while (my $line = <>)
{
chomp $line;
print "[$line]\n" if debug;
if ($line =~ m/^--- /)
{
$file1 = $line;
$file2 = <>;
chomp $file2;
print "[$file2]\n" if debug;
if ($file2 !~ m/^\+\+\+ /)
{
print STDERR "Unexpected file identification lines\n";
print STDERR "$file1\n";
print STDERR "$file2\n";
next OUTER;
}
$header = 0; # Have not output file header yet
my @lines;
my $atline;
last OUTER unless defined($line = <>);
INNER:
while ($line =~ m/^@@ /)
{
chomp $line;
print "@[$line]\n" if debug;
$atline = $line;
@lines = ();
while (defined($line = <>) && $line =~ m/^[- +]/)
{
chomp $line;
print ":[$line]\n" if debug;
push @lines, $line;
}
# Got a complete @@ block of diffs
post_process($atline, @lines);
last OUTER if !defined($line);
next INNER if ($line =~ m/^@@ /);
print STDERR "Unexpected input line: [$line]\n";
last OUTER;
}
}
}
sub differences
{
my($pref, $mref) = @_;
my $pnum = scalar(@$pref);
my $mnum = scalar(@$mref);
print "-->> differences\n" if debug;
return 0 if ($pnum == 0 && $mnum == 0);
return 1 if ($pnum != $mnum);
foreach my $i (0..($pnum-1))
{
my $pline = substr(${$pref}[$i], 1);
my $mline = substr(${$mref}[$i], 1);
return 1 if ($pline ne $mline);
}
print "<<-- differences\n" if debug;
return 0;
}
sub post_process
{
my($atline, @lines) = @_;
print "-->> post_process\n" if debug;
# Work out whether there are any differences left
my @plines = (); # +lines
my @mlines = (); # -lines
my $diffs = 0;
my $ptype = ' '; # Previous line type
foreach my $line (@lines)
{
print "---- $line\n" if debug;
my ($ctype) = ($line =~ m/^(.)/);
if ($ctype eq ' ')
{
if (($ptype eq '-' || $ptype eq '+') && differences(\@plines, \@mlines))
{
$diffs = 1;
last;
}
@plines = ();
@mlines = ();
}
elsif ($ctype eq '-')
{
push @mlines, $line;
}
elsif ($ctype eq '+')
{
push @plines, $line;
}
else
{
print STDERR "Unexpected input line format: $line\n";
exit 1;
}
$ptype = $ctype;
}
$diffs = 1 if differences(\@plines, \@mlines);
if ($diffs != 0)
{
# Print the block of differences, preceded by file header if necessary
if ($header == 0)
{
print "$file1\n";
print "$file2\n";
$header = 1;
}
print "$atline\n";
foreach my $line (@lines)
{
print "$line\n";
}
}
print "<<-- post_process\n" if debug;
return;
}
使用data
文件进行测试,并使用三种变体进行测试:
$ perl checkdiffs.pl data
$ perl checkdiffs.pl data.0
--- file.1 2013-03-30 18:58:35.000000000 -0700
+++ file.2 2013-03-30 18:58:48.000000000 -0700
@@ -1,7 +1,7 @@
Data
Foo
-Bar #0
+Bar ##
Baz
I want to squash the lines down that are equal so they don't show up in the diff anymore. This is pretty simple with
$ perl checkdiffs.pl data.1
--- file.1 2013-03-30 18:58:35.000000000 -0700
+++ file.2 2013-03-30 18:58:48.000000000 -0700
@@ -10,7 +10,7 @@
The problems start when I have multi-line matches like:
-Foo #0
-Bar ##
+Foo ##
+Bar ##
Etc.
$ perl checkdiffs.pl data.2
--- file.1 2013-03-30 18:58:35.000000000 -0700
+++ file.2 2013-03-30 18:58:48.000000000 -0700
@@ -1,7 +1,7 @@
Data
Foo
-Bar #0
+Bar ##
Baz
I want to squash the lines down that are equal so they don't show up in the diff anymore. This is pretty simple with
@@ -10,7 +10,7 @@
The problems start when I have multi-line matches like:
-Foo ##
-Bar #0
+Foo ##
+Bar ##
Etc.
$
这符合您的要求吗?
答案 1 :(得分:0)
我认为这可能有用(除非你有重复对):
sed 's/^[-+]//' filename | perl -ne 'print unless $seen{$_}++'
用空字符串替换起始+/-。然后选择唯一的行。
答案 2 :(得分:0)
您可以使用s modifier和positive lookahead:
Here是regexpal的样本匹配。
这是C#正则表达式样本,应该接近您的需要:
var sourceString = @"-Foo
+Foo
la
-Bar
+Foo
la
-Ko
+Bar
la
+Ko
-Ena
asdsda
-Dva
+Ena
+Dva
";
Regex ItemRegex = new Regex(@"(?s)\-(.*?)\n(?=(.*?)(\+\1))", RegexOptions.Compiled);
foreach (Match ItemMatch in ItemRegex.Matches(sourceString))
{
Console.WriteLine(ItemMatch);
}