我如何在C#和文本编辑器中获得相同的Regex结果?

时间:2019-05-17 03:51:14

标签: c# .net regex xml nested

我有一个包含以下数据的XML:

<navPoint id="navPoint-1" playOrder="1"><navLabel><text>Cover</text></navLabel><content src="Text/01_Cover.xhtml"/></navPoint>
<navPoint id="navPoint-2" playOrder="2"><navLabel><text>Titelblatt</text></navLabel><content src="Text/02_Titlepage.xhtml#Titlepage"/></navPoint>
<navPoint id="navPoint-3" playOrder="3"><navLabel><text>Urheberrechte</text></navLabel><content src="Text/03_Copyright.xhtml#Copyright"/></navPoint>
<navPoint id="navPoint-4" playOrder="4"><navLabel><text>Die S&#x00FC;nde Macht Den Menschen Menschlich, Die Liebe Macht Ihn G&#x00F6;ttlich</text></navLabel><content src="Text/04_FmChapter01.xhtml#FmChapter01"/></navPoint>
<navPoint id="navPoint-5" playOrder="5"><navLabel><text>Vorwort</text></navLabel><content src="Text/05_Vorwort.xhtml#Vorwort"/></navPoint>
<navPoint id="navPoint-6" playOrder="6"><navLabel><text>Ira</text></navLabel><content src="Text/06_Part01.xhtml#Part01"/>
<navPoint id="navPoint-7" playOrder="7"><navLabel><text>Zorn</text></navLabel><content src="Text/07_Chapter01.xhtml#Chapter01"/></navPoint>
<navPoint id="navPoint-8" playOrder="8"><navLabel><text>Erfahrung J&#x00E4;hzorn</text></navLabel><content src="Text/08_Chapter02.xhtml#Chapter02"/></navPoint></navPoint>
<navPoint id="navPoint-9" playOrder="9"><navLabel><text>Luxuria</text></navLabel><content src="Text/09_Part02.xhtml#Part02"/>
<navPoint id="navPoint-10" playOrder="10"><navLabel><text>Wollust</text></navLabel><content src="Text/10_Chapter03.xhtml#Chapter03"/></navPoint>
<navPoint id="navPoint-11" playOrder="11"><navLabel><text>Erfahrung Wollust</text></navLabel><content src="Text/11_Chapter04.xhtml#Chapter04"/></navPoint></navPoint>
<navPoint id="navPoint-12" playOrder="12"><navLabel><text>Avaritia</text></navLabel><content src="Text/12_Part03.xhtml#Part03"/>
<navPoint id="navPoint-13" playOrder="13"><navLabel><text>Geiz</text></navLabel><content src="Text/13_Chapter05.xhtml#Chapter05"/></navPoint>
<navPoint id="navPoint-14" playOrder="14"><navLabel><text>Erfahrung Geiz</text></navLabel><content src="Text/14_Chapter06.xhtml#Chapter06"/></navPoint></navPoint>
<navPoint id="navPoint-15" playOrder="15"><navLabel><text>Ac&#x00E9;dia</text></navLabel><content src="Text/15_Part04.xhtml#Part04"/>
<navPoint id="navPoint-16" playOrder="16"><navLabel><text>Tr&#x00E4;gheit</text></navLabel><content src="Text/16_Chapter07.xhtml#Chapter07"/></navPoint>
<navPoint id="navPoint-17" playOrder="17"><navLabel><text>Erfahrung Faulheit</text></navLabel><content src="Text/17_Chapter08.xhtml#Chapter08"/></navPoint></navPoint>
<navPoint id="navPoint-18" playOrder="18"><navLabel><text>Invidia</text></navLabel><content src="Text/18_Part05.xhtml#Part05"/>
<navPoint id="navPoint-19" playOrder="19"><navLabel><text>Neid</text></navLabel><content src="Text/19_Chapter09.xhtml#Chapter09"/></navPoint>
<navPoint id="navPoint-20" playOrder="20"><navLabel><text>Erfahrung Neid</text></navLabel><content src="Text/20_Chapter10.xhtml#Chapter10"/></navPoint></navPoint>
<navPoint id="navPoint-21" playOrder="21"><navLabel><text>Gula</text></navLabel><content src="Text/21_Part06.xhtml#Part06"/>
<navPoint id="navPoint-22" playOrder="22"><navLabel><text>V&#x00F6;llerei</text></navLabel><content src="Text/22_Chapter11.xhtml#Chapter11"/></navPoint>
<navPoint id="navPoint-23" playOrder="23"><navLabel><text>Achtung V&#x00F6;llerei</text></navLabel><content src="Text/23_Chapter12.xhtml#Chapter12"/></navPoint>
<navPoint id="navPoint-24" playOrder="24"><navLabel><text>Erfahrung V&#x00F6;llerei</text></navLabel><content src="Text/24_Chapter13.xhtml#Chapter13"/></navPoint></navPoint>
<navPoint id="navPoint-25" playOrder="25"><navLabel><text>Superbia</text></navLabel><content src="Text/25_Part07.xhtml#Part07"/>
<navPoint id="navPoint-26" playOrder="26"><navLabel><text>Hochmut</text></navLabel><content src="Text/26_Chapter14.xhtml#Chapter14"/></navPoint>
<navPoint id="navPoint-27" playOrder="27"><navLabel><text>Erfahrung Hochmut</text></navLabel><content src="Text/27_Chapter15.xhtml#Chapter15"/></navPoint></navPoint>
<navPoint id="navPoint-28" playOrder="28"><navLabel><text>Literatur Zu Den 7 Tods&#x00FC;nden</text></navLabel><content src="Text/28_Literatur.xhtml#Literatur"/></navPoint>
<navPoint id="navPoint-29" playOrder="29"><navLabel><text>Inhalt</text></navLabel><content src="Text/29_Contents.xhtml#Contents"/></navPoint>

我需要以下输出:

<li id="NavPoint-#"><a href="Text/01_Cover.xhtml">Cover</a></li>
<li id="NavPoint-#"><a href="Text/02_Titlepage.xhtml#Titlepage">Titelblatt</a></li>
<li id="NavPoint-#"><a href="Text/03_Copyright.xhtml#Copyright">Urheberrechte</a></li>
<li id="NavPoint-#"><a href="Text/04_FmChapter01.xhtml#FmChapter01">Die S&#x00FC;nde Macht Den Menschen Menschlich, Die Liebe Macht Ihn G&#x00F6;ttlich</a></li>
<li id="NavPoint-#"><a href="Text/05_Vorwort.xhtml#Vorwort">Vorwort</a></li>
<li id="NavPoint-#"><a href="Text/06_Part01.xhtml#Part01">Ira</a>
<ol>
<li id="NavPoint-#"><a href="Text/07_Chapter01.xhtml#Chapter01">Zorn</a></li>
<li id="NavPoint-#"><a href="Text/08_Chapter02.xhtml#Chapter02">Erfahrung J&#x00E4;hzorn</a></li></ol></li>
<li id="NavPoint-#"><a href="Text/09_Part02.xhtml#Part02">Luxuria</a>
<ol>
<li id="NavPoint-#"><a href="Text/10_Chapter03.xhtml#Chapter03">Wollust</a></li>
<li id="NavPoint-#"><a href="Text/11_Chapter04.xhtml#Chapter04">Erfahrung Wollust</a></li></ol></li>
<li id="NavPoint-#"><a href="Text/12_Part03.xhtml#Part03">Avaritia</a>
<ol>
<li id="NavPoint-#"><a href="Text/13_Chapter05.xhtml#Chapter05">Geiz</a></li>
<li id="NavPoint-#"><a href="Text/14_Chapter06.xhtml#Chapter06">Erfahrung Geiz</a></li></ol></li>
<li id="NavPoint-#"><a href="Text/15_Part04.xhtml#Part04">Ac&#x00E9;dia</a>
<ol>
<li id="NavPoint-#"><a href="Text/16_Chapter07.xhtml#Chapter07">Tr&#x00E4;gheit</a></li>
<li id="NavPoint-#"><a href="Text/17_Chapter08.xhtml#Chapter08">Erfahrung Faulheit</a></li></ol></li>
<li id="NavPoint-#"><a href="Text/18_Part05.xhtml#Part05">Invidia</a>
<ol>
<li id="NavPoint-#"><a href="Text/19_Chapter09.xhtml#Chapter09">Neid</a></li>
<li id="NavPoint-#"><a href="Text/20_Chapter10.xhtml#Chapter10">Erfahrung Neid</a></li></ol></li>
<li id="NavPoint-#"><a href="Text/21_Part06.xhtml#Part06">Gula</a>
<ol>
<li id="NavPoint-#"><a href="Text/22_Chapter11.xhtml#Chapter11">V&#x00F6;llerei</a></li>
<li id="NavPoint-#"><a href="Text/23_Chapter12.xhtml#Chapter12">Achtung V&#x00F6;llerei</a></li>
<li id="NavPoint-#"><a href="Text/24_Chapter13.xhtml#Chapter13">Erfahrung V&#x00F6;llerei</a></li></ol></li>
<li id="NavPoint-#"><a href="Text/25_Part07.xhtml#Part07">Superbia</a>
<ol>
<li id="NavPoint-#"><a href="Text/26_Chapter14.xhtml#Chapter14">Hochmut</a></li>
<li id="NavPoint-#"><a href="Text/27_Chapter15.xhtml#Chapter15">Erfahrung Hochmut</a></li></ol></li>
<li id="NavPoint-#"><a href="Text/28_Literatur.xhtml#Literatur">Literatur Zu Den 7 Tods&#x00FC;nden</a></li>
<li id="NavPoint-#"><a href="Text/29_Contents.xhtml#Contents">Inhalt</a></li>

在记事本,regex101和其他文本编辑器中使用Regex替换项时,我得到了所需的输出。但是在C#中,我得到了不同的输出。我无法弄清楚这个问题。 C#正则表达式有问题吗?

C#输出:

<li id="NavPoint-#"><a href="01_Cover.xhtml">Cover</a></li>
<li id="NavPoint-#"><a href="02_Titlepage.xhtml#Titlepage">Titelblatt</a></li>
<li id="NavPoint-#"><a href="03_Copyright.xhtml#Copyright">Urheberrechte</a></li>
<li id="NavPoint-#"><a href="04_FmChapter01.xhtml#FmChapter01">Die S&#x00FC;nde Macht Den Menschen Menschlich, Die Liebe Macht Ihn G&#x00F6;ttlich</a></li>
<li id="NavPoint-#"><a href="05_Vorwort.xhtml#Vorwort">Vorwort</a></li>
<li id="NavPoint-#"><a href="06_Part01.xhtml#Part01">Ira</a></li>
<li id="NavPoint-#"><a href="07_Chapter01.xhtml#Chapter01">Zorn</a></li>
<li id="NavPoint-#"><a href="08_Chapter02.xhtml#Chapter02">Erfahrung J&#x00E4;hzorn</a></li>
<li id="NavPoint-#"><a href="09_Part02.xhtml#Part02">Luxuria</a></li>
<li id="NavPoint-#"><a href="10_Chapter03.xhtml#Chapter03">Wollust</a></li>
<li id="NavPoint-#"><a href="11_Chapter04.xhtml#Chapter04">Erfahrung Wollust</a></li>
<li id="NavPoint-#"><a href="12_Part03.xhtml#Part03">Avaritia</a></li>
<li id="NavPoint-#"><a href="13_Chapter05.xhtml#Chapter05">Geiz</a></li>
<li id="NavPoint-#"><a href="14_Chapter06.xhtml#Chapter06">Erfahrung Geiz</a></li>
<li id="NavPoint-#"><a href="15_Part04.xhtml#Part04">Ac&#x00E9;dia</a></li>
<li id="NavPoint-#"><a href="16_Chapter07.xhtml#Chapter07">Tr&#x00E4;gheit</a></li>
<li id="NavPoint-#"><a href="17_Chapter08.xhtml#Chapter08">Erfahrung Faulheit</a></li>
<li id="NavPoint-#"><a href="18_Part05.xhtml#Part05">Invidia</a></li>
<li id="NavPoint-#"><a href="19_Chapter09.xhtml#Chapter09">Neid</a></li>
<li id="NavPoint-#"><a href="20_Chapter10.xhtml#Chapter10">Erfahrung Neid</a></li>
<li id="NavPoint-#"><a href="21_Part06.xhtml#Part06">Gula</a></li>
<li id="NavPoint-#"><a href="22_Chapter11.xhtml#Chapter11">V&#x00F6;llerei</a></li>
<li id="NavPoint-#"><a href="23_Chapter12.xhtml#Chapter12">Achtung V&#x00F6;llerei</a></li>
<li id="NavPoint-#"><a href="24_Chapter13.xhtml#Chapter13">Erfahrung V&#x00F6;llerei</a></li>
<li id="NavPoint-#"><a href="25_Part07.xhtml#Part07">Superbia</a></li>
<li id="NavPoint-#"><a href="26_Chapter14.xhtml#Chapter14">Hochmut</a></li>
<li id="NavPoint-#"><a href="27_Chapter15.xhtml#Chapter15">Erfahrung Hochmut</a></li>
<li id="NavPoint-#"><a href="28_Literatur.xhtml#Literatur">Literatur Zu Den 7 Tods&#x00FC;nden</a></li>
<li id="NavPoint-#"><a href="29_Contents.xhtml#Contents">Inhalt</a></li>

我正在使用以下正则表达式替换:

编辑正则表达式:

Patter 1: "<navPoint id="navPoi[^"]+" playOrder="[^"]+"><navLabel><text>([^<>\r\n]+)</text></navLabel><content src="([^<>\r\n]+)"/></navPoint>"
Substitution 1: "<li id="NavPoint-#"><a href="$2">$1</a></li>"

Patter 2: "<navPoint id="navPoi[^"]+" playOrder="[^"]+"><navLabel><text>([^<>\r\n]+)</text></navLabel><content src="([^<>\r\n]+)"/>$"
Substitution 2: "<li id="NavPoint-#"><a href="$2">$1</a>\r\n<ol>"

Pattern 3: "</navPoint>"
Substitution 3: "</ol></li>"

C#正则表达式:

string firstPattern = @"<navPoint id=""navPoi[^""]+"" playOrder=""[^""]+""><navLabel><text>(.+)<\/text><\/navLabel><content src=""([^<>\r?\n]+)""\/><\/navPoint>";
string firstSubstitution = @"<li id=""NavPoint-#""><a href=""$2"">$1</a></li>";
RegexOptions options = RegexOptions.Multiline;
Regex firstRegex = new Regex(firstPattern, options);
string newNavMap = firstRegex.Replace(navMapValues, firstSubstitution);
string secondPattern = @"<navPoint id=""navPoi[^""]+"" playOrder=""[^""]+""><navLabel><text>(.+)<\/text><\/navLabel><content src=""([^<>\r?\n]+)""\/>";
string secondSubstitution = @"<li id=""NavPoint-#""><a href=""$2"">$1</a>" + Environment.NewLine + "<ol>";
Regex secondRegex = new Regex(secondPattern, options);
string anotherNavMap = secondRegex.Replace(newNavMap, secondSubstitution);
string thirdPattern = @"</navPoint>";
string thirdSubstitution = @"</ol></li>";
Regex thirdRegex = new Regex(thirdPattern, options);
string finalNavMap = thirdRegex.Replace(anotherNavMap, thirdSubstitution);
finalNavMap = finalNavMap.Replace("\r\n</ol></li>", "</ol></li>");

3 个答案:

答案 0 :(得分:4)

使用XSLT会容易得多。它只需要一个模板规则:

<xsl:template match="navPoint">
  <li id="NavPoint-#"><a href="{content/@src}">
    <xsl:value-of select="navLabel/text"/>
  </li>
</xsl:template>

答案 1 :(得分:0)

以下是您输入的字符串和代码的小提琴:

https://dotnetfiddle.net/t119c9

它将打印您期望的输出。 我怀疑输入字符串没有正确转义。

答案 2 :(得分:0)

该表达式可以帮助您更接近所需的输出。它确实捕获了两个目标组,您可以简单地替换它们:

<(.*?)<text>(.*?)<\/text>.*src="(.*?)"\/>(.*)

对于ol标签,我猜它可以在替换后简单地添加。

enter image description here

RegEx

如果这不是您想要的表达式,则可以在regex101.com中修改/更改表达式。

RegEx电路

您还可以在jex.im中可视化您的表达式:

enter image description here

JavaScript演示

const regex = /<(.*?)<text>(.*?)<\/text>.*src="(.*?)"\/>(.*)/gm;
const str = `<navPoint id="navPoint-1" playOrder="1"><navLabel><text>Cover</text></navLabel><content src="Text/01_Cover.xhtml"/></navPoint>
<navPoint id="navPoint-2" playOrder="2"><navLabel><text>Titelblatt</text></navLabel><content src="Text/02_Titlepage.xhtml#Titlepage"/></navPoint>
<navPoint id="navPoint-3" playOrder="3"><navLabel><text>Urheberrechte</text></navLabel><content src="Text/03_Copyright.xhtml#Copyright"/></navPoint>
<navPoint id="navPoint-4" playOrder="4"><navLabel><text>Die S&#x00FC;nde Macht Den Menschen Menschlich, Die Liebe Macht Ihn G&#x00F6;ttlich</text></navLabel><content src="Text/04_FmChapter01.xhtml#FmChapter01"/></navPoint>
<navPoint id="navPoint-5" playOrder="5"><navLabel><text>Vorwort</text></navLabel><content src="Text/05_Vorwort.xhtml#Vorwort"/></navPoint>
<navPoint id="navPoint-6" playOrder="6"><navLabel><text>Ira</text></navLabel><content src="Text/06_Part01.xhtml#Part01"/>
<navPoint id="navPoint-7" playOrder="7"><navLabel><text>Zorn</text></navLabel><content src="Text/07_Chapter01.xhtml#Chapter01"/></navPoint>
<navPoint id="navPoint-8" playOrder="8"><navLabel><text>Erfahrung J&#x00E4;hzorn</text></navLabel><content src="Text/08_Chapter02.xhtml#Chapter02"/></navPoint></navPoint>
<navPoint id="navPoint-9" playOrder="9"><navLabel><text>Luxuria</text></navLabel><content src="Text/09_Part02.xhtml#Part02"/>
<navPoint id="navPoint-10" playOrder="10"><navLabel><text>Wollust</text></navLabel><content src="Text/10_Chapter03.xhtml#Chapter03"/></navPoint>
<navPoint id="navPoint-11" playOrder="11"><navLabel><text>Erfahrung Wollust</text></navLabel><content src="Text/11_Chapter04.xhtml#Chapter04"/></navPoint></navPoint>
<navPoint id="navPoint-12" playOrder="12"><navLabel><text>Avaritia</text></navLabel><content src="Text/12_Part03.xhtml#Part03"/>
<navPoint id="navPoint-13" playOrder="13"><navLabel><text>Geiz</text></navLabel><content src="Text/13_Chapter05.xhtml#Chapter05"/></navPoint>
<navPoint id="navPoint-14" playOrder="14"><navLabel><text>Erfahrung Geiz</text></navLabel><content src="Text/14_Chapter06.xhtml#Chapter06"/></navPoint></navPoint>
<navPoint id="navPoint-15" playOrder="15"><navLabel><text>Ac&#x00E9;dia</text></navLabel><content src="Text/15_Part04.xhtml#Part04"/>
<navPoint id="navPoint-16" playOrder="16"><navLabel><text>Tr&#x00E4;gheit</text></navLabel><content src="Text/16_Chapter07.xhtml#Chapter07"/></navPoint>
<navPoint id="navPoint-17" playOrder="17"><navLabel><text>Erfahrung Faulheit</text></navLabel><content src="Text/17_Chapter08.xhtml#Chapter08"/></navPoint></navPoint>
<navPoint id="navPoint-18" playOrder="18"><navLabel><text>Invidia</text></navLabel><content src="Text/18_Part05.xhtml#Part05"/>
<navPoint id="navPoint-19" playOrder="19"><navLabel><text>Neid</text></navLabel><content src="Text/19_Chapter09.xhtml#Chapter09"/></navPoint>
<navPoint id="navPoint-20" playOrder="20"><navLabel><text>Erfahrung Neid</text></navLabel><content src="Text/20_Chapter10.xhtml#Chapter10"/></navPoint></navPoint>
<navPoint id="navPoint-21" playOrder="21"><navLabel><text>Gula</text></navLabel><content src="Text/21_Part06.xhtml#Part06"/>
<navPoint id="navPoint-22" playOrder="22"><navLabel><text>V&#x00F6;llerei</text></navLabel><content src="Text/22_Chapter11.xhtml#Chapter11"/></navPoint>
<navPoint id="navPoint-23" playOrder="23"><navLabel><text>Achtung V&#x00F6;llerei</text></navLabel><content src="Text/23_Chapter12.xhtml#Chapter12"/></navPoint>
<navPoint id="navPoint-24" playOrder="24"><navLabel><text>Erfahrung V&#x00F6;llerei</text></navLabel><content src="Text/24_Chapter13.xhtml#Chapter13"/></navPoint></navPoint>
<navPoint id="navPoint-25" playOrder="25"><navLabel><text>Superbia</text></navLabel><content src="Text/25_Part07.xhtml#Part07"/>
<navPoint id="navPoint-26" playOrder="26"><navLabel><text>Hochmut</text></navLabel><content src="Text/26_Chapter14.xhtml#Chapter14"/></navPoint>
<navPoint id="navPoint-27" playOrder="27"><navLabel><text>Erfahrung Hochmut</text></navLabel><content src="Text/27_Chapter15.xhtml#Chapter15"/></navPoint></navPoint>
<navPoint id="navPoint-28" playOrder="28"><navLabel><text>Literatur Zu Den 7 Tods&#x00FC;nden</text></navLabel><content src="Text/28_Literatur.xhtml#Literatur"/></navPoint>
<navPoint id="navPoint-29" playOrder="29"><navLabel><text>Inhalt</text></navLabel><content src="Text/29_Contents.xhtml#Contents"/></navPoint>`;
const subst = `\n<li id="NavPoint-#"><a href="$3">$2</a></li>`;

// The substituted value will be contained in the result variable
const result = str.replace(regex, subst);

console.log('Substitution result: ', result);