查找字符串并将其移至文档顶部

时间:2019-04-09 17:02:56

标签: vb.net

我到处都在寻找这个答案,很抱歉,如果这个答案很简单。我还是VB.Net的新手。我感谢大家的帮助。

我的问题是我的脚本中有一个大文件,文件中包含字符串,后跟内容(例如)字符串。 字符串位于整个文件中。我需要做的是收集所有字符串,并将它们移至顶部文档序言。因此它们将在[]之间移动。因此,从本质上讲,代码将需要找到正则表达式^

您能给我的任何帮助都会很棒。

我尝试创建一个数组来执行此操作,但失败了。然后考虑使用REGEX来获取

然后我尝试了file.Append,但是没有用。

这是我想出的代码,但是没有用。实际上,构建需要很长时间。

Dim regex = New Regex("<Entity.*$")
Dim lines As String() = File.ReadAllLines(fileName)
Dim arrEntity(0 To -1) As String

Dim regexMatches = regex.Matches(fileName)
Dim i As Integer = 0
For Each match As Match In regexMatches
    'If <!ENTITY.*> is found write it to an array
    Dim entityLine = match.ToString
    finalValue.Append(arrEntity(i))
    i += 1
Next
'Go to top of document and write the entity list between []

预期结果将是fileName文档,其中所有行都将在文档顶部的[]之间向上移动。除顶部序言外,文档中不应有其他

示例SGM文件

<!DOCTYPE DOC PUBLIC "-//USA-DOD//DTD 38784STD-BV7//EN"[
<!ENTITY cdcs_5-35.wmf SYSTEM "graphics\CDCS_5-35.wmf" NDATA wmf>
<!ENTITY cdcs_2-2a.wmf SYSTEM "graphics\CDCS_2-2A.wmf" NDATA wmf>
<doc service="xs" docid="BKw46" docstat="formal" verstatpg="ver" cycle="1" chglevel="1">
<front numcols="1">
<idinfo>
<?Pub Lcl _divid="100" _parentid="0">
<tmidno>Life with Pets</tmidno>
<chgnum>Change 1</chgnum>
<chgdate>2 August 2018</chgdate>
<chghistory>
<chginfo>
<chgtxt>Change 1</chgtxt>
<date>2 August 2018</date>
</front>
<!ENTITY cdcs_2-19.wmf SYSTEM "graphics\CDCS_2-19.wmf" NDATA wmf>
<!ENTITY cdcs_3-5.wmf SYSTEM "graphics\CDCS_3-5.wmf" NDATA wmf>
<body numcols="1">
<chapter>
<title>This is chapter 1</title>
<!ENTITY cdcs_2-5.wmf SYSTEM "graphics\CDCS_2-5.wmf" NDATA wmf>
<!ENTITY cdcs_2-24.wmf SYSTEM "graphics\CDCS_2-24.wmf" NDATA wmf>
<para0>
<title>Climb the ladder immedietly</title>
<para>Retrieve the cat.</para></para0></chapter>
<chapter>
<title>Don't forget to feed the dog</title>
<para0>
<!ENTITY cdcs_4-48.wmf SYSTEM "graphics\CDCS_4-48.wmf" NDATA wmf>
<title>Prep for puppies</title>
<para>Puppies are cute</para></para0>
</chapter>
</body>
</doc>

1 个答案:

答案 0 :(得分:1)

好吧,我用您发布的示例文本测试了此代码:

//button[@class='inline' and text()='Load']

这是最终结果:

    Dim largeFilePath As String = "largeFilePath"
    Dim lines = File.ReadLines(largeFilePath).ToList 'don't use ReadAllLines
    Dim reg = New Regex("\<\!NOTATION.*$|\<\!ENTITY.*$", RegexOptions.IgnoreCase)
    Dim entities = From line In lines
                   Where reg.IsMatch(line)

    Dim dictionary As New Dictionary(Of Integer, String)
    Dim idx = -1
    For Each s In entities
        idx = lines.IndexOf(s, idx + 1)
        dictionary.Add(idx, s)
    Next

    Dim deletedItems = 0
    For Each itm In dictionary
        lines.RemoveAt(itm.Key - deletedItems)
        deletedItems += 1
    Next

    For Each s In dictionary.Values
        lines.Insert(1, s)
    Next

    Using sw As New System.IO.StreamWriter("newfile.txt")
        For Each line As String In lines
            sw.WriteLine(line)
        Next
        sw.Flush()
        sw.Close()
    End Using

代码已更新并在 100 MB 文件上进行了测试,处理过程仅需 2秒