函数不会从文本主体中删除字符串

时间:2019-04-10 15:17:36

标签: vb.net

嗨,这是我的问题Find string and move it up to the top of the document的延续

代码几乎完成了我所需要的。它会搜索<!Entity>元素并将其移至文档顶部,但不会将其从文档正文中删除。我还需要将搜索范围扩大到<!。* $

这是代码。在第15行的“ lines.RemoveAt(item.Key)”上,它似乎不起作用

Dim largeFilePath As String = "largeFilePath"
Dim lines = File.ReadLines(largeFilePath).ToList 'don't use ReadAllLines
Dim reg = New Regex("<!Entity.*$", RegexOptions.IgnoreCase)
Dim entities = From line In lines
               Where reg.IsMatch(line)

Dim dictionary As New Dictionary(Of Integer, String)
Dim idx = 0
For Each s In entities
    idx = lines.IndexOf(s, idx)
    dictionary.Add(idx, s)
Next

For Each itm In dictionary
    lines.RemoveAt(itm.Key)
Next

For Each s In entities.ToList
    lines.Insert(1, s)
Next

Using sw As New System.IO.StreamWriter("newfile.txt")
    For Each line As String In lines
        sw.WriteLine(line)
    Next
    sw.Flush()
    sw.Close()
End Using

测试数据

<!DOCTYPE DOC PUBLIC "-//USA-DOD//DTD 38784STD-BV7//EN"[
<!ENTITY cdcs_5-35.wmf SYSTEM "graphics\CDCS_5-35.wmf" NDATA wmf>
<!ENTITY cdcs_2-2a.wmf SYSTEM "graphics\CDCS_2-2A.wmf" NDATA wmf>
<!ENTITY GCS38849 SYSTEM "Graphics\GCS38849.cgm" NDATA cgm>
<!ENTITY GCS39016 SYSTEM "Graphics\GCS39016.cgm" NDATA cgm>
<doc service="xs" docid="BKw46" docstat="formal" verstatpg="ver" cycle="1" chglevel="1">
<front numcols="1">
<idinfo>
<?Pub Lcl _divid="100" _parentid="0">
<tmidno>Life with Pets</tmidno>
<chgnum>Change 1</chgnum>
<chgdate>2 August 2018</chgdate>
<chghistory>
<chginfo>
<!ENTITY CDCS_4-21B SYSTEM "Graphics\CDCS_4-21B.wmf" NDATA wmf>
<!ENTITY CDCS_4-24B SYSTEM "Graphics\CDCS_4-24B.wmf" NDATA wmf>
<!ENTITY CDCS_4-42B SYSTEM "Graphics\CDCS_4-42B.png" NDATA png>
<!ENTITY CDCS_MFW11 SYSTEM "Graphics\CDCS_MFW1.wmf" NDATA wmf>
<!ENTITY CDCS_blk10_Cont_consl_markingsworking1 SYSTEM "Graphics\CDCS_b
<chgtxt>Change 1</chgtxt>
<date>2 August 2018</date>
</front>
<!ENTITY cdcs_2-19.wmf SYSTEM "graphics\CDCS_2-19.wmf" NDATA wmf>
<!ENTITY cdcs_3-5.wmf SYSTEM "graphics\CDCS_3-5.wmf" NDATA wmf>
<body numcols="1">
<chapter>
<title>This is chapter 1</title>
<!ENTITY cdcs_2-5.wmf SYSTEM "graphics\CDCS_2-5.wmf" NDATA wmf>
<!ENTITY cdcs_2-24.wmf SYSTEM "graphics\CDCS_2-24.wmf" NDATA wmf>
<para0>
<title>Climb the ladder immedietly</title>
<para>Retrieve the cat.</para></para0></chapter>
<chapter>
<title>Don't forget to feed the dog</title>
<para0>
<!ENTITY GCS17777 SYSTEM "Graphics\GCS17777.cgm" NDATA cgm>
<!ENTITY GCS17782 SYSTEM "Graphics\GCS17782.cgm" NDATA cgm>
<!ENTITY GCS17783 SYSTEM "Graphics\GCS17783.cgm" NDATA cgm>
<!ENTITY GCS19983 SYSTEM "Graphics\GCS19983.cgm" NDATA cgm>
<!ENTITY GCS19984 SYSTEM "Graphics\GCS19984.cgm" NDATA cgm>
<!ENTITY cdcs_4-48.wmf SYSTEM "graphics\CDCS_4-48.wmf" NDATA wmf>
<title>Prep for puppies</title>
<para>Puppies are cute</para></para0>
</chapter>
</body>
</doc>

结果是一个文件,其中所有<!Entity元素均移至文档顶部,但仍保留在文档正文中。

所需的结果是将所有<!。*元素移至文档顶部并从文档正文中删除。

感谢您对这个问题的所有帮助和耐心。 最高

1 个答案:

答案 0 :(得分:0)

问题解决了。

Dim largeFilePath As String = "largeFilePath"
Dim lines = File.ReadLines(largeFilePath).ToList 'don't use ReadAllLines
Dim reg = New Regex("\<\!NOTATION.*$|\<\!ENTITY.*$", RegexOptions.IgnoreCase)
Dim entities = From line In lines
               Where reg.IsMatch(line)

Dim dictionary As New Dictionary(Of Integer, String)
Dim idx = -1
For Each s In entities
    idx = lines.IndexOf(s, idx + 1)
    dictionary.Add(idx, s)
Next

Dim deletedItems = 0
For Each itm In dictionary
    lines.RemoveAt(itm.Key - deletedItems)
    deletedItems += 1
Next

For Each s In dictionary.Values
    lines.Insert(1, s)
Next

Using sw As New System.IO.StreamWriter("newfile.txt")
    For Each line As String In lines
        sw.WriteLine(line)
    Next
    sw.Flush()
    sw.Close()
End Using