多行正则表达式应在文件中多次匹配(如果可能,可以使用单行命令)

时间:2018-12-13 14:59:03

标签: regex bash git perl

我正在尝试将一些(多行)git历史记录信息(提取文件名更改)转换为CSV文件。这是我的DebugView。在该网站上运行良好。

正则表达式:

commit (.+)\n(?:.*\n)+?similarity index (\d+)+%\n(rename|copy) from (.+)\n\3 to (.+)\n

样本输入:

commit 2701af4b3b66340644b01835a03bcc760e1606f8
Author: ostrovsky.alex <ostrovsky.alex@a51b5712-02d0-11de-9992-cbdf800730d7>
Date:   Sat Oct 16 20:44:32 2010 +0000

    * Moved old sources to Maven src/main/java

diff --git a/alexo-chess/src/ao/chess/v2/move/Pawns.java b/alexo-chess/src/main/java/ao/chess/v2/move/Pawns.java
similarity index 100%
rename from alexo-chess/src/ao/chess/v2/move/Pawns.java
rename to alexo-chess/src/main/java/ao/chess/v2/move/Pawns.java

commit ea53898dcc969286078700f42ca5be36789e7ea7
Author: ostrovsky.alex <ostrovsky.alex@a51b5712-02d0-11de-9992-cbdf800730d7>
Date:   Sat Oct 17 03:30:43 2009 +0000

    synch

diff --git a/src/chess/v2/move/Pawns.java b/alexo-chess/src/ao/chess/v2/move/Pawns.java
similarity index 100%
copy from src/chess/v2/move/Pawns.java
copy to alexo-chess/src/ao/chess/v2/move/Pawns.java

commit b869f395429a2c1345ce100953bfc6038d9835f5
Author: ostrovsky.alex <ostrovsky.alex@a51b5712-02d0-11de-9992-cbdf800730d7>
Date:   Wed Oct 7 22:43:06 2009 +0000

    MctsPlayer works

diff --git a/ao/chess/v2/move/Pawns.java b/src/chess/v2/move/Pawns.java
similarity index 100%
copy from ao/chess/v2/move/Pawns.java
copy to src/chess/v2/move/Pawns.java

commit 4c697c510f5154d20be7500be1cbdecbaf99495c
Author: ostrovsky.alex <ostrovsky.alex@a51b5712-02d0-11de-9992-cbdf800730d7>
Date:   Wed Sep 23 15:06:17 2009 +0000

    * synch

diff --git a/v2/move/Pawns.java b/ao/chess/v2/move/Pawns.java
similarity index 95%
rename from v2/move/Pawns.java
rename to ao/chess/v2/move/Pawns.java
index e0172a3..e3659c5 100644
--- a/v2/move/Pawns.java
+++ b/ao/chess/v2/move/Pawns.java

但是,当我尝试运行以下perl命令(在Windows 10上为git bash)时,我只得到一条匹配行(与您在网站上看到的示例中的4行相反)我链接到上面)。

我知道这可能是愚蠢的,就像它需要处于循环中一样。但是,我对使用-0777并多次应用模式感到困惑。我尝试了-p选项,但是它打印出了整个输入,我只想查看print的输出(即CSV行)。我还认为/g会使该模式多次应用于输入文件,但是由于-0777将其全部变成一行,所以我不确定。

<Pawns.java.history.txt perl -0777 -ne 'if (/commit (.+)\n(?:.*\n)+?similarity index (\d+)+%\n(rename|copy) from (.+)\n\3 to (.+)\n/g) { print $1.",".$2.",".$3.",".$4.",".$5."\n" }'

输出只有一行,而regex and sample file总共应该是4行:

2701af4b3b66340644b01835a03bcc760e1606f8,100,rename,alexo-chess/src/ao/chess/v2/move/Pawns.java,alexo-chess/src/main/java/ao/chess/v2/move/Pawns.java

预期输出:

2701af4b3b66340644b01835a03bcc760e1606f8,100,rename,alexo-chess/src/ao/chess/v2/move/Pawns.java,alexo-chess/src/main/java/ao/chess/v2/move/Pawns.java
ea53898dcc969286078700f42ca5be36789e7ea7,100,copy,src/chess/v2/move/Pawns.java,alexo-chess/src/ao/chess/v2/move/Pawns.java
b869f395429a2c1345ce100953bfc6038d9835f5,100,copy,ao/chess/v2/move/Pawns.java,src/chess/v2/move/Pawns.java
4c697c510f5154d20be7500be1cbdecbaf99495c,95,rename,v2/move/Pawns.java,ao/chess/v2/move/Pawns.java

2 个答案:

答案 0 :(得分:2)

//g运算符在列表上下文中返回捕获的结果。由于有5组捕获括号和4个匹配项,因此返回的列表包含20个元素。您需要遍历该列表。您的代码只看第一个匹配项。这是一种技术:

perl -0777 -nE '
    @matches = /commit (.+)\n(?:.*\n)+?similarity index (\d+)+%\n(rename|copy) from (.+)\n\3 to (.+)\n/g;
    $" = ",";
    while (@matches) {
        @thismatch = splice @matches, 0, 5;
        say "@thismatch";
    }
' Pawns.java.history.txt 

答案 1 :(得分:2)

您只需要将if转换为while

perl -0777 -ne 'while (/commit (.+)\n(?:.*\n)+?similarity index (\d+)+%\n(rename|copy) from (.+)\n\3 to (.+)\n/g) { print $1.",".$2.",".$3.",".$4.",".$5."\n" }' file

2701af4b3b66340644b01835a03bcc760e1606f8,100,rename,alexo-chess/src/ao/chess/v2/move/Pawns.java,alexo-chess/src/main/java/ao/chess/v2/move/Pawns.java
ea53898dcc969286078700f42ca5be36789e7ea7,100,copy,src/chess/v2/move/Pawns.java,alexo-chess/src/ao/chess/v2/move/Pawns.java
b869f395429a2c1345ce100953bfc6038d9835f5,100,copy,ao/chess/v2/move/Pawns.java,src/chess/v2/move/Pawns.java
4c697c510f5154d20be7500be1cbdecbaf99495c,95,rename,v2/move/Pawns.java,ao/chess/v2/move/Pawns.java