查找最后一次出现并添加元素

时间:2019-05-16 12:57:53

标签: vb.net

我有一个带有多个和元素的SGML文件。一章可以独立存在,但一个节必须位于一章内。在一个章节中可以有多个部分。我的问题是,在多个部分的最后,本章必须有一个结束元素。现在,它不只是具有多个section元素。

我尝试将文档放入数组并计算出元素,但这没有用。

我尝试在下一个元素之前添加,但左端元素的顺序错误。

我认为也许将其视为XML文件,然后在文件中找到最后一个孩子,然后粘贴。但是不知道该怎么做。

对不起,我对此感到很困惑,所以没有任何代码可发布。我不知道该如何处理。

非常感谢您的帮助。

这是文档样本

<doc service="xT">
<body numcols="1">

<chapter id="chap1">
<para0><title></title></para0>
</chapter>

<chapter id="chap2"> <title>THEORY</title>
<section id="Thoery">
<title>theory Section</title>
<para0 verstatus="ver">
<title>Theory Para 0 </title>
<text>blah blah</text>
</para0>
</section>

<section id="Next section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>

<section id="More sections">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>

<section id="section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>

<chapter id="chap1">
<para0><title></title></para0>
</chapter>

<chapter id="chap1">
<para0><title></title></para0>
</chapter>

<chapter> <title>Chapter Title</title>
<section id="Section ID">
<title>Section Title</title>
<para0>
<title>Para0 Title</title>
<para>blah blah</para>
</para0>
</section>

<section id="Next section">
<title>title</title>
<para0>
<line>Title</line>
<text>blah blah</text>
</para0>
</section>

<ipbchap>
<tags></tags>
</ipbchap>

</body>
<rear>
<tags></tags>
</rear>
</doc>

这是预期的结果

    

<chapter id="chap1">
<para0><title></title></para0>
</chapter>

<chapter id="chap2"> <title>THEORY</title>
<section id="Thoery">
<title>theory Section</title>
<para0 verstatus="ver">
<title>Theory Para 0 </title>
<text>blah blah</text>
</para0>
</section>

<section id="Next section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>

<section id="More sections">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>

<section id="section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>
</chapter>

<chapter id="chap1">
<para0><title></title></para0>
</chapter>

<chapter id="chap1">
<para0><title></title></para0>
</chapter>

<chapter> <title>Chapter Title</title>
<section id="Section ID">
<title>Section Title</title>
<para0>
<title>Para0 Title</title>
<para>blah blah</para>
</para0>
</section>

<section id="Next section">
<title>title</title>
<para0>
<line>Title</line>
<text>blah blah</text>
</para0>
</section>
</chapter>

<ipbchap>
<tags></tags>
</ipbchap>

</body>
<rear>
<tags></tags>
</rear>
</doc>

这是创建此文件的代码。

'Read all text of the Master Document
'and create a StringBuilder from it.
'All replacements will be done on the
'StringBuilder as it is more efficient
'than using Strings directly
Dim strMasterDoc = File.ReadAllText(existingMasterFilePath)
Dim newMasterFileBuilder As New StringBuilder(strMasterDoc)

'Create a regex with a named capture group.
'The name is 'EntityNumber' and captures just the
'entity digits for use in building the file name
Dim rx = New Regex("&" & Prefix & "_Ch(?<EntityNumber>\d+(?:-\d+)*)[;]")
Dim rxMatches = rx.Matches(strMasterDoc)

For Each match As Match In rxMatches
    Dim entity = match.ToString
    'Build the file name using the captured digits from the entity in the master file
    Dim entityFileName = Prefix & $"_Ch{match.Groups("EntityNumber")}.sgm.bak"
    Dim entityFilePath = Path.Combine(searchDir, entityFileName)
    'Check if the entity file exists and use its contents
    'to replace the entity in the copy of the master file
    'contained in the StringBuilder
    If File.Exists(entityFilePath) Then
        Dim entityFileContents As String = File.ReadAllText(entityFilePath)
        newMasterFileBuilder.Replace(entity, entityFileContents)
    End If
Next


'write the processed contents of the master file to a different file
File.WriteAllText(newMasterFilePath, newMasterFileBuilder.ToString)

问题是代码没有包含最后一个元素,因为该元素为空。

所以我可以采取的一种方法是将这些空白部分添加到文档中,然后将其删除?

1 个答案:

答案 0 :(得分:0)

如果您使用SGML解析器,您的标记文本解析就很好。您只需要告诉SGML哪些标签可以省略。查看您的标记,chapter的end-element标记以及section的start-和end-element标记似乎都被忽略了/应该被推断出来,这反映在我添加的DOCTYPE中输入文字:

<!DOCTYPE doc [
    <!ELEMENT doc - - (body,rear)>
    <!ELEMENT body - - (chapter+,ipbchap)>
    <!ELEMENT chapter - O (title?,section+)>
    <!ELEMENT section O O (title?,para0*)>
    <!ELEMENT para0 - - (title?,(line|text|para)*)>
    <!ELEMENT para - - (#PCDATA)>
    <!ELEMENT ipbchap - - (tags?)>
    <!ELEMENT tags - - ANY>
    <!ELEMENT title - - (#PCDATA)>
    <!ELEMENT text - - (#PCDATA)>
    <!ELEMENT line - - (#PCDATA)>
    <!ELEMENT rear - - (tags?)>
    <!ATTLIST doc id CDATA #IMPLIED service CDATA #IMPLIED>
    <!ATTLIST body id CDATA #IMPLIED numcols NUMBER #IMPLIED>
    <!ATTLIST para0 id CDATA #IMPLIED verstatus CDATA #IMPLIED>
    <!ATTLIST (chapter|section|para|ipbchap|tags) id CDATA #IMPLIED>
]>
<doc service="xT">
<body numcols="1">

<chapter id="chap1">
<para0><title></title></para0>
</chapter>

<chapter id="chap2"> <title>THEORY</title>
<section id="Thoery">
<title>theory Section</title>
<para0 verstatus="ver">
<title>Theory Para 0 </title>
<text>blah blah</text>
</para0>
</section>

<section id="Next section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>

<section id="More sections">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>

<section id="section">
<title>title</title>
<para0>
<title>Title</title>
<text>blah blah</text>
</para0>
</section>

<chapter id="chap1">
<para0><title></title></para0>
</chapter>

<chapter id="chap1">
<para0><title></title></para0>
</chapter>

<chapter> <title>Chapter Title</title>
<section id="Section ID">
<title>Section Title</title>
<para0>
<title>Para0 Title</title>
<para>blah blah</para>
</para0>
</section>

<section id="Next section">
<title>title</title>
<para0>
<line>Title</line>
<text>blah blah</text>
</para0>
</section>

<ipbchap>
<tags></tags>
</ipbchap>

</body>
<rear>
<tags></tags>
</rear>
</doc>

如果通过osgmlnorm程序(SP / OpenSP的一部分,请参见下文)运行以上输入,则它将产生以下输出:

<DOC SERVICE="xT">
<BODY NUMCOLS="1">
<CHAPTER ID="chap1">
<SECTION>
<PARA0>
<TITLE></TITLE>
</PARA0>
</SECTION>
</CHAPTER>
<CHAPTER ID="chap2">
<TITLE>THEORY</TITLE>
<SECTION ID="Thoery">
<TITLE>theory Section</TITLE>
<PARA0 VERSTATUS="ver">
<TITLE>Theory Para 0 </TITLE>
<TEXT>blah blah</TEXT>
</PARA0>
</SECTION>
<SECTION ID="Next section">
<TITLE>title</TITLE>
<PARA0>
<TITLE>Title</TITLE>
<TEXT>blah blah</TEXT>
</PARA0>
</SECTION>
<SECTION ID="More sections">
<TITLE>title</TITLE>
<PARA0>
<TITLE>Title</TITLE>
<TEXT>blah blah</TEXT>
</PARA0>
</SECTION>
<SECTION ID="section">
<TITLE>title</TITLE>
<PARA0>
<TITLE>Title</TITLE>
<TEXT>blah blah</TEXT>
</PARA0>
</SECTION>
</CHAPTER>
<CHAPTER ID="chap1">
<SECTION>
<PARA0>
<TITLE></TITLE>
</PARA0>
</SECTION>
</CHAPTER>
<CHAPTER ID="chap1">
<SECTION>
<PARA0>
<TITLE></TITLE>
</PARA0>
</SECTION>
</CHAPTER>
<CHAPTER>
<TITLE>Chapter Title</TITLE>
<SECTION ID="Section ID">
<TITLE>Section Title</TITLE>
<PARA0>
<TITLE>Para0 Title</TITLE>
<PARA>blah blah</PARA>
</PARA0>
</SECTION>
<SECTION ID="Next section">
<TITLE>title</TITLE>
<PARA0>
<LINE>Title</LINE>
<TEXT>blah blah</TEXT>
</PARA0>
</SECTION>
</CHAPTER>
<IPBCHAP>
<TAGS></TAGS>
</IPBCHAP>
</BODY>
<REAR>
<TAGS></TAGS>
</REAR>
</DOC>

我希望这就是您的想法。 osgmlnorm(以及用于从SGML生成XML的替代程序,例如osx)是James Clark的SP SGML处理程序包的一部分。有更多适用于Linux和Mac OS的最新版本(OpenSP / OpenJade),但是当您使用Visual Basic时,我指的是James的原始SP网站http://www.jclark.com/sp/,您可以在其中下载(旧版本,但仍可以正常运行)。