如何在单个单元格中读取/解释Excel中的html标签?

时间:2017-07-03 15:28:47

标签: html excel vba excel-vba

所以这是我的问题,我在Excel colunm中获得了一公吨html代码。这样的事情:

Excel单元格中的HTML

<p>Alerte suite &#224; 1 vrt de EUR XXXXXX en provenance du YYYYY.</p>
<p>&#160;</p>
<p>Thing</p>
<p>b&#233;n&#233;ficiaire : Someone</p>
<p>&#160;</p>
<p>Flux inf&#233;rieur &#224; ZZZZZ EUR, ne n&#233;cessitant pas d'investigation.<br /> D&#233;cision du 5/7/2012</p>

所以我想对待它,以便信息将保留在一个单元格中。 在我的serch中,我看到了这篇文章Stack Question: Read html in Exel

但它已经过时了,@ BornToCode从2014年开始响应,建议使用宏来自动复制/粘贴,但它会将信息转换为多个单元格。

在我身边,我使用以下宏来“排序”修复我的问题

代码VBA

Sub suppHTML()
'Selection As Range For Each cell In Selection
cell.Select
Call supphtmlinCell
Next cell
End Sub

Sub supphtmlinCell()
Dim strPattern0 As String: strPattern0 = "</p>"
Dim strReplace0 As String: strReplace0 = vbNewLine
Dim regEx0 As New RegExp
Dim strInput0 As String

Dim strPattern As String: strPattern = "<.*?>"
Dim strReplace As String: strReplace = " "
Dim regEx As New RegExp
Dim strInput As String

Dim strPattern1 As String: strPattern1 = "&#160;"
Dim strReplace1 As String: strReplace1 = " "
Dim regEx1 As New RegExp
Dim strInput1 As String

Dim strPattern2 As String: strPattern2 = "&#233;"
Dim strReplace2 As String: strReplace2 = "é"
Dim regEx2 As New RegExp
Dim strInput2 As String

Dim strPattern3 As String: strPattern3 = "&#232;"
Dim strReplace3 As String: strReplace3 = "è"
Dim regEx3 As New RegExp
Dim strInput3 As String

Dim strPattern4 As String: strPattern4 = "&#231;"
Dim strReplace4 As String: strReplace4 = "ç"
Dim regEx4 As New RegExp
Dim strInput4 As String

Dim strPattern5 As String: strPattern5 = "&#235;"
Dim strReplace5 As String: strReplace5 = "ë"
Dim regEx5 As New RegExp
Dim strInput5 As String

Dim strPattern6 As String: strPattern6 = "&#224;"
Dim strReplace6 As String: strReplace6 = "à"
Dim regEx6 As New RegExp
Dim strInput6 As String

Dim strPattern7 As String: strPattern7 = "&gt;"
Dim strReplace7 As String: strReplace7 = ">"
Dim regEx7 As New RegExp
Dim strInput7 As String

Dim strPattern8 As String: strPattern8 = "&lt;"
Dim strReplace8 As String: strReplace8 = "<"
Dim regEx8 As New RegExp
Dim strInput8 As String

Dim strPattern9 As String: strPattern9 = "&amp;"
Dim strReplace9 As String: strReplace9 = "&"
Dim regEx9 As New RegExp
Dim strInput9 As String


    If strPattern0 <> "" Then
        strInput0 = ActiveCell.Offset(0, 0).Value


        With regEx0
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern0
        End With

        If regEx0.Test(strInput0) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx0.Replace(strInput0, strReplace0)
        Else
            'MsgBox ("Not matched")
        End If
    End If


'ActiveCell.Offset(0, 0).Value = 2
If strPattern <> "" Then
        strInput = ActiveCell.Offset(0, 0).Value
        strReplace = ""

        With regEx
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern
        End With

        If regEx.Test(strInput) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx.Replace(strInput, strReplace)
        Else
            'MsgBox ("Not matched")
        End If
    End If


    If strPattern1 <> "" Then
        strInput1 = ActiveCell.Offset(0, 0).Value


        With regEx1
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern1
        End With

        If regEx1.Test(strInput1) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx1.Replace(strInput1, strReplace1)
        Else
            'MsgBox ("Not matched")
        End If
    End If





    If strPattern2 <> "" Then
        strInput2 = ActiveCell.Offset(0, 0).Value


        With regEx2
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern2
        End With

        If regEx2.Test(strInput2) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx2.Replace(strInput2, strReplace2)
        Else
            'MsgBox ("Not matched")
        End If
    End If



    If strPattern3 <> "" Then
        strInput3 = ActiveCell.Offset(0, 0).Value


        With regEx3
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern3
        End With

        If regEx3.Test(strInput3) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx3.Replace(strInput3, strReplace3)
        Else
            'MsgBox ("Not matched")
        End If
    End If


    If strPattern4 <> "" Then
        strInput4 = ActiveCell.Offset(0, 0).Value


        With regEx4
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern4
        End With

        If regEx4.Test(strInput4) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx4.Replace(strInput4, strReplace4)
        Else
            'MsgBox ("Not matched")
        End If
    End If


    If strPattern5 <> "" Then
        strInput5 = ActiveCell.Offset(0, 0).Value


        With regEx5
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern5
        End With

        If regEx5.Test(strInput5) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx5.Replace(strInput5, strReplace5)
        Else
            'MsgBox ("Not matched")
        End If
    End If


    If strPattern6 <> "" Then
        strInput6 = ActiveCell.Offset(0, 0).Value


        With regEx6
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern6
        End With

        If regEx6.Test(strInput6) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx6.Replace(strInput6, strReplace6)
        Else
            'MsgBox ("Not matched")
        End If
    End If

    If strPattern7 <> "" Then
        strInput7 = ActiveCell.Offset(0, 0).Value


        With regEx7
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern7
        End With

        If regEx7.Test(strInput7) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx7.Replace(strInput7, strReplace7)
        Else
            'MsgBox ("Not matched")
        End If
    End If

    If strPattern8 <> "" Then
        strInput8 = ActiveCell.Offset(0, 0).Value


        With regEx8
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern8
        End With

        If regEx8.Test(strInput8) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx8.Replace(strInput8, strReplace8)
        Else
            'MsgBox ("Not matched")
        End If
    End If

    If strPattern9 <> "" Then
        strInput9 = ActiveCell.Offset(0, 0).Value


        With regEx9
            .Global = True
            .MultiLine = True
            .IgnoreCase = False
            .Pattern = strPattern9
        End With

        If regEx9.Test(strInput9) Then
            'MsgBox (regEx.Replace(strInput, strReplace))
            ActiveCell.Offset(0, 0).Value = regEx9.Replace(strInput9, strReplace9)
        Else
            'MsgBox ("Not matched")
        End If
    End If     
End Sub

所以我的解决方案是找到我不喜欢的字符串,并用我喜欢的其他字符串替换它。但正如大家都看到的那样,这种解决方案不是最优的。 例如,现在我仍然错过了解释“à”,“â”或“ô”的方法。 我需要为我想要处理的每个新字符串编辑宏。

所以我在这里寻找比我更好的解决方案,而且我相信有一个更清晰的解决方案,因为Excel可以通过复制/粘贴来解释HTML。

2 个答案:

答案 0 :(得分:1)

如何使用HTML文件类?

让我们说你的样本字符串在Worksheets("Sheet1")的范围A1:A6中:

Sub test()
Dim i As Integer

For i = 1 To 6 'Rows 1 to 6 hold your HTML formatted values

    With CreateObject("htmlfile")
        .Open
        .write Worksheets("Sheet1").Range("A" & i).Value
        .Close
        Worksheets("Sheet1").Range("B" & i).Value = .body.outerText
    End With

Next i
End Sub

答案 1 :(得分:0)

这里是Rik Sportel代码的a,,以防有人到达该页面时出现与我相同的问题:

Sub supphtml()

u = Selection.Address
For Each cel In Range(u)

    With CreateObject("htmlfile")
        .Open
        .write cel.Value
        .Close
        cel.Value = .body.outerText
    End With

Next cel
End Sub

这将转换在运行宏

之前选择的所有单元格