Question

我正在创建一个正则表达式，以查看所有文档顶部的版权信息是否格式正确。

复制权很长，因此我的正则表达也很长。

让我们说版权信息如下：

/*/////////////////////////////////////////////////////////////////////////

Copyright content which is a lot goes in here.

Programmer:  Tono Nam

/////////////////////////////////////////////////////////////////////////*/

然后我将使用正则表达式：

var pattern = 

@"/\*/////////////////////////////////////////////////////////////////////////

Copyright content which is a lot goes in here.

Programmer:  (?<ProgammerName>[\w '\.]+)

/////////////////////////////////////////////////////////////////////////\*/";

如果我将正则表达式应用于第一个文本，它会给我一个匹配，一切都很棒。 问题是当正则表达式不匹配时让我们说程序员在顶部添加了额外的/。我的正则表达式将不再匹配。通过这个例子很容易注意到，但真正的版权要长得多，知道错误在哪里会很好。或者有时会出现错误的错误。 例如，您可能会遇到Programer而不是Programmer。正因为如此，我将不得不调查整个版权并尝试发现错误。我认为应该有一种更简单的方法来做我需要的事情

修改

如果主题恰好是：

/ * ///////////////////////////////////////////// ////////////////////////////

这里有很多版权内容SOME_MISPELED_WORD。

程序员：Tono Nam

/////////////////////////////////////////////// ////////////////////////// * /

然后正则表达式因SOME_MISPELED_WORD而不匹配，因此我想知道发生错误的索引，以便我可以查看：

/ * ///////////////////////////////////////////// ////////////////////////////

这里有很多版权内容＆lt; -------------- here

而不是整件事。

另一个例子是版权信息是：

/ * ///////////////////////////////////////////// ////////////////////////////

这里有很多版权内容。

程序员：Tono Nam

/////////////////////////////////////////////// /////////////////////////// * /

我希望在最后一行收到错误，因为还有一个额外的/。

Answer 1

我认为正如你所拥有的正则表达式太严格了。尝试更多类似的内容：

@"^/\*(/*)(.*)(Programmer:|Programer:){1}(\d*)(<ProgrammerName>){1}(/*)\*/$"

Answer 2

最后我有解决方案：

基本上我们想知道正则表达式失败的地方。如果我们在哪里有不变的字符串，我们将能够比较它们并看到它不同的字符。换句话说，如果我在哪里：

var a = "12345";
var b = "1234A";

然后我们可以将a[0]与b[0]进行比较，然后将a[1]与b[1]进行比较，直到我们有所不同为止。

让我们这样做！

让我们说我们的版权必须如下：

/*/////

Copyright content which is a lot goes in here.

Programmer:Tono Nam

Description:This is the description of the file....

/////*/

让我们删除所有可能变化的内容，以便我们可以应用我们的第一个示例：

/*/////

Copyright content which is a lot goes in here.

Programmer:

Description:

/////*/

然后唯一复杂的是创建一个正则表达式，它将删除所有可能变化的东西，以便最终得到该字符串。所以模式将是：

 var regexPattern = @"(?s)(/\*/*.+Programmer:)(?<name>[^\r\n]*?)(\r.*Description:)(?<desc>[^\r\n]*)(\r.*?/*\*/)";

使用该模式，我们将能够转向：

/*/////

Copyright content which is a lot goes in here.

Programmer:Tono Nam bla bla bla

Description:THIS IS A DIFFERENT DESCRIPTION

/////*/

INTO

/*/////

Copyright content which is a lot goes in here.

Programmer:

Description:

/////*/

现在我们有两个要比较的字符串！

以下是我刚才解释的代码

// the subject we want to test
            var subject =
@"/*/////

Copyright content which is a lot goes in here.

Programmer:Tono Nam

Description:This is the description of the file....

/////*/";

            // the actual pattern this should be a readonly constant type on a real program cause it never should change
            var pattern =
@"/*/////

Copyright content which is a lot goes in here.

Programmer:

Description:

/////*/";

            // we use this pattern to turn the first subject into the second if we can
            var regexPattern = @"(?s)(/\*/*.+Programmer:)(?<name>[^\r\n]*?)(\r.*Description:)(?<desc>[^\r\n]*)(\r.*?/*\*/)";

            // note $1 means group 1 so here we are basically removing the groups name and desc
            var newSubject = Regex.Replace(subject, regexPattern, "$1$2$3");

            // at this point if newSubject = pattern we know that the header is formatted correctly!

            // Let's see where they are different!
            for (int i = 0; i < pattern.Length; i++)
            {
                if (pattern[i] != newSubject[i])
                {
                    throw new Exception("There is a problem at index " + i);
                }
            }

在这个例子中它应该工作，因为我的主题正确格式化。但是如果我在乞讨时多加一点/那么看看会发生什么:(我突出显示了6 /个字符应该有5个

enter image description here

Answer 3

试试这个Regex：

/\*/{2,}(?:\n|.)*(?:Programm?er\s*:\s*(?<programmer>.+))[\n\r\s]*(?:Description\s*:\s*(?<description>.+))?

并获取名为programmer和description的群组。这适用于所有上述条件。

正则表达式比较字符串，看看区别在哪里

3 个答案:

最后我有解决方案：

以下是我刚才解释的代码