处理字符串

时间:2016-07-14 13:20:47

标签: string excel vba excel-vba

单元格可以包含大量字符。我不确定限制,但我测试的是450多个字符。在VBA中,我没有问题在字符串中插入该单元格的值,通过debug.print读取它,使用它上面的函数(如Len(str))来查找字符数。

我的问题

我想要使用的字符串是HTML字符串,我在其上应用格式然后删除HTML标记。这些格式没有问题,使用宏我认为没有必要显示(它很长)但是什么时候删除HTML标签,当字符串高于255时遇到问题字符。

自行重播并查看

以下是删除有关字体颜色的HTML标记的一段代码的一部分,经过调整以使情况脱颖而出。要使用它,请选择包含HTML标记的单元格并运行代码。小心 - 当长度大于255个字符时,它将运行无限循环,因此请逐步使用F8并在第一次查看debug.prints。只需跳过删除,甚至不会出现任何错误。

Sub removeColorTags()
    Dim i As Integer
    Dim rng As Range
    Dim str As String
    Set rng = ActiveCell
        i = InStr(rng.Value, "<font")
    Do Until i = 0
    Debug.Print Len(rng.Value)
    str = rng.Value
    Debug.Print str 'Displays correctly
        rng.Characters(i, 20).Delete
        i = InStr(rng.Value, "</font>")
        rng.Characters(i, 7).Delete
        i = InStr(rng.Value, "<font")
    Loop
End Sub

以下是您可以在单元格中解析以尝试代码以查看其成功而没有问题的示例。它将删除颜色标签但保留大小标签。确保你得到整行(250个字符)

<font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font>

以下是您可以在单元格中解析以尝试代码以查看其失败的示例。确保你得到整行(450个字符)

<font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font><font color=#8DB4E2><size=09>Action ligne 3</font>

我想要什么

我想要一种删除长字符串上的HTML标记的方法。在不参考单元格的情况下执行此操作(获取字符串中的值,使用Replace或其他方式删除标记)不起作用,因为在将值返回时,格式化将丢失。这一点的重点是格式化细胞。

2 个答案:

答案 0 :(得分:1)

这是我最终做的事情。首先,让我们看一下输入,转换信息和输出的屏幕截图。我从一个带有文本的普通格式化excel单元开始,然后将其转换为类似于(但不太像)HTML的东西。这个问题询问我如何从HTMl字符串(屏幕截图的中间部分)中删除子字符串(HTML标记)而不会丢失格式。

enter image description here

这是如何回答问题

我需要一种方法来删除子字符串,而不会丢失超过255个字符的单元格的格式。这意味着不使用characters.insertcharacters.delete,因为正如蒂姆威廉姆斯指出的那样,它们会在255个字符之后引发问题。因此,作为一种解决方法,我将要删除的子串之间的输入字符串分段,记录它们的格式,将它们放回原处,然后使用characters(x,y).font重新应用格式。

我将要通过char扫描HTML字符串char并将其记录在临时字符串中。当遇到HTML标记时,它会停止记录临时字符串,并将其与该临时字符串相关的格式记录在一个数组中。然后它读取标记并将“当前格式”更改为HTML标记的内容,并再次开始在新的临时字符串中记录。我承认通过调用函数可以缩短sub,但它可以工作。

Sub FromHTML(rngToConvert As Range)
    Dim i As Integer, j As Integer, k As Integer
    Dim strHTML As String, strTemp As String
    Dim rng As Range
    Dim arr()
    Dim lengthFormatted As Integer
    Dim optBold As Boolean, optIta As Boolean, optUnd As Boolean, optCol As String, optSize As Integer
    Dim inStrTemp As Boolean
    Dim nbChars As Integer

    Set rng = rngToConvert.Offset(0, 2)
    rng.Clear
    strHTML = rngToConvert.Value

    If InStr(strHTML, "<") = 0 Then Exit Sub


    ReDim arr(6, 0)

    inStrTemp = False
    strTemp = ""
    optBold = False
    optIta = False
    optUnd = False
    optCol = "0,0,0"
    optSize = "11"

    For i = 1 To Len(strHTML)

        If Not Mid(strHTML, i, 1) = "<" And Not Mid(strHTML, i, 4) = "[LF]" Then
            'All WANTED characters go here
            strTemp = strTemp & Mid(strHTML, i, 1)
            inStrTemp = True

            If Len(strTemp) > 200 Or i = Len(strHTML) Then
                'Cuts them shorter than 200 chars
                'In retrospect this isn't necessary but doesn't interfere
                ReDim Preserve arr(6, j)
                arr(0, j) = strTemp
                arr(1, j) = optBold
                arr(2, j) = optIta
                arr(3, j) = optUnd
                arr(4, j) = optCol
                arr(5, j) = optSize
                arr(6, j) = Len(strTemp)
                strTemp = ""
                j = j + 1
            End If
        ElseIf Mid(strHTML, i, 4) = "[LF]" Then
            '[LF] is what I used to indicate that there was a line change in the original text
                ReDim Preserve arr(6, j)
                arr(0, j) = strTemp
                arr(1, j) = optBold
                arr(2, j) = optIta
                arr(3, j) = optUnd
                arr(4, j) = optCol
                arr(5, j) = optSize
                arr(6, j) = Len(strTemp)
                strTemp = ""
                j = j + 1

            strTemp = vbLf
            inStrTemp = True
            i = i + 3

                ReDim Preserve arr(6, j)
                arr(0, j) = strTemp
                arr(1, j) = optBold
                arr(2, j) = optIta
                arr(3, j) = optUnd
                arr(4, j) = optCol
                arr(5, j) = optSize
                arr(6, j) = Len(strTemp)
                strTemp = ""
                j = j + 1
        Else
             If inStrTemp = True Then
                'Records the temporary string and the formats it used
                ReDim Preserve arr(6, j)
                arr(0, j) = strTemp
                arr(1, j) = optBold
                arr(2, j) = optIta
                arr(3, j) = optUnd
                arr(4, j) = optCol
                arr(5, j) = optSize
                arr(6, j) = Len(strTemp)
                strTemp = ""
                j = j + 1
                inStrTemp = False
            End If

            'If we get here we hit a HTML tag, so we read it and skip to after it
            If Mid(strHTML, i, 3) = "<b>" Then
                optBold = True
                i = i + 2 
            ElseIf Mid(strHTML, i, 4) = "</b>" Then
                optBold = False
                i = i + 3
            ElseIf Mid(strHTML, i, 3) = "<i>" Then
                optIta = True
                i = i + 2
            ElseIf Mid(strHTML, i, 4) = "</i>" Then
                optIta = False
                i = i + 3
            ElseIf Mid(strHTML, i, 3) = "<u>" Then
                optUnd = True
                i = i + 2
            ElseIf Mid(strHTML, i, 4) = "</u>" Then
                optUnd = False
                i = i + 3
            ElseIf Mid(strHTML, i, 11) Like "<c=???????>" Then
                       'optCol = RED, GREEN, BLUE
                optCol = CInt("&H" & Mid(strHTML, i + 4, 2)) & "," & _
                        CInt("&H" & Mid(strHTML, i + 6, 2)) & "," & _
                        CInt("&H" & Mid(strHTML, i + 8, 2))
                i = i + 10
            ElseIf Mid(strHTML, i, 6) Like "<s=??>" Then
                optSize = CInt(Mid(strHTML, i + 3, 2))
                i = i + 5
            End If
        End If
    Next

    'Filling the cell with unformatted text
    For i = 0 To UBound(arr, 2)
      'This debug.print shows the tempString that was recorded and the associated formats
        Debug.Print arr(0, i) & " Bold=" & arr(1, i) & " Italic=" & arr(2, i) & " Underline=" & arr(3, i) & " RGB=" & arr(4, i) & " Size =" & arr(5, i)
        rng.Value = rng.Value + arr(0, i)
    Next
    'Applying formats according to the arrays
    nbChars = 1
    For i = 0 To UBound(arr, 2)
        If arr(0, i) = vbLf Then
            nbChars = nbChars + 1
        Else
            rng.Characters(nbChars, arr(6, i)).Font.Bold = arr(1, i)
            rng.Characters(nbChars, arr(6, i)).Font.Italic = arr(2, i)
            rng.Characters(nbChars, arr(6, i)).Font.Underline = arr(3, i)
            rng.Characters(nbChars, arr(6, i)).Font.Color = RGB(Split(arr(4, i), ",")(0), Split(arr(4, i), ",")(1), Split(arr(4, i), ",")(2))
            rng.Characters(nbChars, arr(6, i)).Font.Size = CInt(arr(5, i))
            nbChars = nbChars + arr(6, i)
        End If
    Next
End Sub

我觉得这个子是复杂的,我想用它回答的原因是因为它可以帮助任何人试图达到类似的目标。当然,需要进行一些调整。这是我过去从格式化文本到类似HTML的文本的功能。这不是问题的一部分,但有助于理解标签。它基于我在网上发现的功能(虽然我不记得在哪里)。如果要按原样使用两个子,请务必删除此函数放置的HTML字符串开头和结尾的<html></html>标记。

Function fnConvert2HTML(myCell As Range) As String
    Dim bldTagOn, itlTagOn, ulnTagOn, colTagOn, sizTagOn As Boolean
    Dim i, chrCount As Integer
    Dim chrCol, chrLastCol, chrSiz, chrLastSiz, htmlTxt As String

    bldTagOn = False
    itlTagOn = False
    ulnTagOn = False
    colTagOn = False
    sizTagOn = False
    chrCol = "NONE"
    htmlTxt = "<html>"
    chrCount = myCell.Characters.Count

    For i = 1 To chrCount
        With myCell.Characters(i, 1)
            'If (.Font.Color) Then
                chrCol = fnGetCol(.Font.Color)
                If chrCol <> chrLastCol Then
                    htmlTxt = htmlTxt & "<c=#" & chrCol & ">"
                    chrLastCol = chrCol
                End If
            'End If

            If (.Font.Size) Then
                chrSiz = .Font.Size
                If Len(chrSiz) = 1 Then chrSiz = "0" & chrSiz
                If Not chrLastSiz = chrSiz Then
                    htmlTxt = htmlTxt & "<s=" & chrSiz & ">"
                End If
                chrLastSiz = chrSiz
            End If

            If .Font.Bold = True Then
                If Not bldTagOn Then
                    htmlTxt = htmlTxt & "<b>"
                    bldTagOn = True
                End If
            Else
                If bldTagOn Then
                    htmlTxt = htmlTxt & "</b>"
                    bldTagOn = False
                End If
            End If

            If .Font.Italic = True Then
                If Not itlTagOn Then
                    htmlTxt = htmlTxt & "<i>"
                    itlTagOn = True
                End If
            Else
                If itlTagOn Then
                    htmlTxt = htmlTxt & "</i>"
                    itlTagOn = False
                End If
            End If

            If .Font.Underline > 0 Then
                If Not ulnTagOn Then
                    htmlTxt = htmlTxt & "<u>"
                    ulnTagOn = True
                End If
            Else
                If ulnTagOn Then
                    htmlTxt = htmlTxt & "</u>"
                    ulnTagOn = False
                End If
            End If

            If (Asc(.Text) = 10) Then
                htmlTxt = htmlTxt & "[LF]"
            Else
                htmlTxt = htmlTxt & .Text
            End If
        End With
    Next

    If bldTagOn Then
        htmlTxt = htmlTxt & "</b>"
        bldTagOn = False
    End If
    If itlTagOn Then
        htmlTxt = htmlTxt & "</i>"
        itlTagOn = False
    End If
    If ulnTagOn Then
        htmlTxt = htmlTxt & "</u>"
        ulnTagOn = False
    End If
    htmlTxt = htmlTxt & "</html>"
    fnConvert2HTML = htmlTxt
End Function

答案 1 :(得分:0)

您可以遍历单元格中的所有字符,检查您是否不在内部和html标记,并将文本添加到另一个单元格。以下代码将执行此操作(考虑从单元格A1输入并输出到单元格A2,两者都在活动表格上):

for (outerKey, innerDict) in dict {
    print("Key-value pairs for outer dict key \(outerKey) follows:")
    for (key, value) in innerDict {
        print("\tkey: \(key), value: \(value)")
    }
}
/*  Key-value pairs for outer dict key 2 follows:
        key: Key2, value: Value2
        key: Key1, value: Value1
    Key-value pairs for outer dict key 0 follows:
        key: Key2, value: Value2
        key: Key1, value: Value1
    Key-value pairs for outer dict key 1 follows:
        key: Key2, value: Value2
        key: Key1, value: Value1
    Key-value pairs for outer dict key 3 follows:
        key: Key2, value: Value2
        key: Key1, value: Value1                  */

我在Excel 2016 for Mac上进行了测试,效果很好。