突出显示算法 - 当匹配长度不等于搜索字符串的长度时

时间:2013-11-11 23:16:10

标签: vb.net

我有一个突出显示算法,它采用字符串并在其中添加围绕匹配的突出显示代码。我遇到的问题是像“查找tæst”这样的单词作为要搜索的字符串,“taest”作为要查找的字符串。由于搜索字符串的长度与匹配的长度不匹配,我无法准确找到匹配的结尾。在我的情况下,IndexOf向我显示了匹配,但由于合并后的æ被计为一个字符,因此它会丢掉我对比赛结束的检测。我认为IndexOf在这里不适合我。返回匹配索引和匹配长度的东西都可以。但我不知道还能用什么。

    ' cycle through search words and replace them in the text
    For intWord = LBound(m_arrSearchWords) To UBound(m_arrSearchWords)

       If m_arrSearchWords(intWord).Length > 0 Then

          ' replace instances of the word with the word surrounded by bold codes

          ' find starting position
          intPos = strText.IndexOf(m_arrSearchWords(intWord), System.StringComparison.CurrentCultureIgnoreCase)
          Do While intPos <> -1

             strText = strText.Substring(0, (intPos - 1) - 0 + 1) & cstrHighlightCodeOn & strText.Substring(intPos, m_arrSearchWords(intWord).Length) & cstrHighlightCodeOff & strText.Substring(intPos + m_arrSearchWords(intWord).Length)
             intPos = strText.IndexOf(m_arrSearchWords(intWord), intPos + m_arrSearchWords(intWord).Length + cstrHighlightCodeOn.Length + cstrHighlightCodeOff.Length, System.StringComparison.CurrentCultureIgnoreCase)

          Loop

       End If

    Next intWord

Substring方法失败,因为长度超出了字符串的结尾。我修复了以搜索词结尾的字符串(上面没有显示)。但是较长的字符串将被错误地突出显示,我需要修复它们。

2 个答案:

答案 0 :(得分:0)

虽然IndexOf返回匹配长度会很好,但事实证明你可以自己做比较来弄清楚。我只是用长度进行二次比较以找到最大的匹配。我从搜索到的单词的长度开始,这应该是最大的。然后向后工作以找到长度。一旦我找到了我使用它的长度。如果我找不到它,我的工作时间会很长。如果我正在搜索的字符串较大或较小,则此方法有效。这意味着在正常情况下至少有一个额外的比较,在最坏的情况下,这意味着基于搜索词长度的附加数字。也许如果我有IndexOf的实现,我可以改进它。但至少这是有效的。

    ' cycle through search words and replace them in the text
    For intWord = LBound(m_arrSearchWords) To UBound(m_arrSearchWords)

       If m_arrSearchWords(intWord).Length > 0 Then

          ' find starting position
          intPos = strText.IndexOf(m_arrSearchWords(intWord), System.StringComparison.CurrentCultureIgnoreCase)
          Do While intPos <> -1

             intOrigLength = m_arrSearchWords(intWord).Length

             ' if there isn't enough of the text left to add the search word length to
             If strText.Length < ((intPos + intOrigLength - 1) - 0 + 1) Then

                ' use shorter length
                intOrigLength = ((strText.Length - 1) - intPos + 1)

             End If

             ' find largest match
             For intLength = intOrigLength To 1 Step -1

                If m_arrSearchWords(intWord).Equals(strText.Substring(intPos, intLength), StringComparison.CurrentCultureIgnoreCase) Then

                   ' if match found, highlight it
                   strText = strText.Substring(0, (intPos - 1) - 0 + 1) & cstrHighlightCodeOn & strText.Substring(intPos, intLength) & cstrHighlightCodeOff & strText.Substring(intPos + intLength)

                   ' find next
                   intPos = strText.IndexOf(m_arrSearchWords(intWord), intPos + intLength + cstrHighlightCodeOn.Length + cstrHighlightCodeOff.Length, System.StringComparison.CurrentCultureIgnoreCase)

                   ' exit search for largest match
                   Exit For

                End If

             Next

             ' if we didn't find it by searching smaller - search larger
             If intLength = 0 Then

                For intLength = intOrigLength + 1 To ((strText.Length - 1) - intPos + 1)

                   If m_arrSearchWords(intWord).Equals(strText.Substring(intPos, intLength), StringComparison.CurrentCultureIgnoreCase) Then

                      ' if match found, highlight it
                      strText = strText.Substring(0, (intPos - 1) - 0 + 1) & cstrHighlightCodeOn & strText.Substring(intPos, intLength) & cstrHighlightCodeOff & strText.Substring(intPos + intLength)

                      ' find next
                      intPos = strText.IndexOf(m_arrSearchWords(intWord), intPos + intLength + cstrHighlightCodeOn.Length + cstrHighlightCodeOff.Length, System.StringComparison.CurrentCultureIgnoreCase)

                      ' exit search for largest match
                      Exit For

                   End If

                Next

             End If

          Loop

       End If

    Next intWord

答案 1 :(得分:-1)

如果我理解正确,您正在寻找一个返回“匹配字符串”的函数 - 换句话说,当您在s1内寻找s2时,您想要确切地知道s2的哪一部分匹配(匹配的第一个和最后一个字符的索引)。这允许您突出显示匹配,并且不会修改字符串(大写/小写,连字等)。

我没有VB.net,不幸的是VBA没有与VB.net完全相同的搜索功能 - 所以请理解以下代码正确识别匹配的开始和结束,但它只是经过测试大小写匹配。我希望这可以帮助你解决问题。

Option Compare Text
Option Explicit

Function startEndIndex(bigString, smallString)
' function that returns start, end index
' of the match
' it keeps shortening the bigString until no match is found
' this is how it takes care of mismatches in number of characters
' because of a match between "similar" strings
Dim i1, i2
Dim shorterString

i2 = 0

' first see if there is a match at all:
i1 = InStr(1, bigString, smallString, vbTextCompare)

If i1 > 0 Then
  ' largest value that i2 can have is end of string:
  i2 = Len(bigString)

  ' can make it shorter - but no shorter than twice the length of the search string
  If i2 > i1 + 2 * Len(smallString) Then i2 = i1 + 2 * Len(smallString)
  shorterString = Mid(bigString, i1, i2 - i1)

  ' keep making the string shorter until there is no match:
  While InStr(1, shorterString, smallString, vbTextCompare) > 0
    i2 = i2 - 1
    shorterString = Mid(bigString, i1, i2 - i1)
  Wend

End If

' return the values as an array:
startEndIndex = Array(i1, endOfString)

End Function


Sub test()
' a simple test routine to see that things work:
Dim a
Dim longString: longString = "This is a very long TaesT of a complicated string"
a = startEndIndex(longString, "very long taest")
If a(0) = 0 And a(1) = 0 Then
MsgBox "no match found"
Else
Dim highlightString As String
highlightString = Left(longString, a(0) - 1) & "*" & Mid(longString, a(0), a(1) - a(0) + 1) & _
  "*" & Mid(longString, a(1) + 1)
  MsgBox "start at " & a(0) & " and end at " & a(1) & vbCrLf & _
  "string matched is '" & Mid(longString, a(0), a(1) - a(0) + 1) & "'" & vbCrLf & _
  "with highlighting: " & highlightString
End If
End Sub