使用VB.NET扫描文件中的字符串,忽略额外的空格

时间:2014-10-27 17:41:54

标签: regex vb.net string whitespace

我正在搜索文件中的一串单词。例如"一两三"。我一直在使用:

Dim text As String = File.ReadAllText(filepath)
For each phrase in phrases
    index = text.IndexOf(phrase, StringComparison.OrdinalIgnoreCase)
    If index >= 0 Then
        Exit For
    End If
Next

它工作正常,但现在我发现有些文件可能包含目标短语,而单词之间的空白间隔不止一个。

例如我的代码找到

" one two three"但未能找到" one two three"

有没有一种方法可以使用正则表达式或任何其他技术捕获短语,即使单词之间的距离超过一个空格?

我知道我可以使用

Dim text As String = File.ReadAllText(filepath)
For each phrase in phrases
    text=text.Replace("  "," ")
    index = text.IndexOf(phrase, StringComparison.OrdinalIgnoreCase)
    If index >= 0 Then
        Exit For
    End If
Next

但我想知道是否有更有效的方法来实现这个目标

3 个答案:

答案 0 :(得分:1)

您可以创建一个删除任何双重空格的函数。

Option Strict On
Option Explicit On
Option Infer Off
Public Class Form1
    Private Sub Form1_Load(sender As Object, e As EventArgs) Handles MyBase.Load
        Dim testString As String = "one two  three   four    five        six"
        Dim excessSpacesGone As String = RemoveExcessSpaces(testString)
        'one two three four five six
        Clipboard.SetText(excessSpacesGone)
        MsgBox(excessSpacesGone)
    End Sub
    Function RemoveExcessSpaces(source As String) As String
        Dim result As String = source
        Do
            result = result.Replace("  ", " "c)
        Loop Until result.IndexOf("  ") = -1
        Return result
    End Function
End Class

答案 1 :(得分:1)

代码中的注释将解释代码

        Dim inputStr As String = "This contains one        Two  three and some     other words" '<--- this be the input from the file
        inputStr = Regex.Replace(inputStr, "\s{2,}", " ") '<--- Replace extra white spaces if any
        Dim searchStr As String = "one two three" '<--- be the string to be searched
        searchStr = Regex.Replace(searchStr, "\s{2,}", " ") '<--- Replace extra white spaces if any
        If UCase(inputStr).Contains(UCase(searchStr)) Then '<--- check if input contains search string
            MsgBox("contains") '<-- display message if it contains
        End If

答案 2 :(得分:0)

您可以将短语转换为每个单词之间带有\s+的正则表达式,然后检查文本是否匹配。 e.g。

Dim text = "This contains one    Two  three"
Dim phrases = {
    "one two three"
}
' Splits each phrase into words and create the regex from the words.
For each phrase in phrases.Select(Function(p) String.Join("\s+", p.Split({" "c}, StringSplitOptions.RemoveEmptyEntries)))
    If Regex.IsMatch(text, phrase, RegexOptions.IgnoreCase) Then
        Console.WriteLine("Found!")
        Exit For
    End If
Next

请注意,这不会检查短语开头/结尾的单词边界,因此"This contains someone two threesome"也会匹配。如果您不想这样,请在正则表达式的两端添加"\s"