所以这是我的问题,我在Excel colunm中获得了一公吨html代码。这样的事情:
<p>Alerte suite à 1 vrt de EUR XXXXXX en provenance du YYYYY.</p>
<p> </p>
<p>Thing</p>
<p>bénéficiaire : Someone</p>
<p> </p>
<p>Flux inférieur à ZZZZZ EUR, ne nécessitant pas d'investigation.<br /> Décision du 5/7/2012</p>
所以我想对待它,以便信息将保留在一个单元格中。 在我的serch中,我看到了这篇文章Stack Question: Read html in Exel
但它已经过时了,@ BornToCode从2014年开始响应,建议使用宏来自动复制/粘贴,但它会将信息转换为多个单元格。
在我身边,我使用以下宏来“排序”修复我的问题
Sub suppHTML()
'Selection As Range For Each cell In Selection
cell.Select
Call supphtmlinCell
Next cell
End Sub
Sub supphtmlinCell()
Dim strPattern0 As String: strPattern0 = "</p>"
Dim strReplace0 As String: strReplace0 = vbNewLine
Dim regEx0 As New RegExp
Dim strInput0 As String
Dim strPattern As String: strPattern = "<.*?>"
Dim strReplace As String: strReplace = " "
Dim regEx As New RegExp
Dim strInput As String
Dim strPattern1 As String: strPattern1 = " "
Dim strReplace1 As String: strReplace1 = " "
Dim regEx1 As New RegExp
Dim strInput1 As String
Dim strPattern2 As String: strPattern2 = "é"
Dim strReplace2 As String: strReplace2 = "é"
Dim regEx2 As New RegExp
Dim strInput2 As String
Dim strPattern3 As String: strPattern3 = "è"
Dim strReplace3 As String: strReplace3 = "è"
Dim regEx3 As New RegExp
Dim strInput3 As String
Dim strPattern4 As String: strPattern4 = "ç"
Dim strReplace4 As String: strReplace4 = "ç"
Dim regEx4 As New RegExp
Dim strInput4 As String
Dim strPattern5 As String: strPattern5 = "ë"
Dim strReplace5 As String: strReplace5 = "ë"
Dim regEx5 As New RegExp
Dim strInput5 As String
Dim strPattern6 As String: strPattern6 = "à"
Dim strReplace6 As String: strReplace6 = "à"
Dim regEx6 As New RegExp
Dim strInput6 As String
Dim strPattern7 As String: strPattern7 = ">"
Dim strReplace7 As String: strReplace7 = ">"
Dim regEx7 As New RegExp
Dim strInput7 As String
Dim strPattern8 As String: strPattern8 = "<"
Dim strReplace8 As String: strReplace8 = "<"
Dim regEx8 As New RegExp
Dim strInput8 As String
Dim strPattern9 As String: strPattern9 = "&"
Dim strReplace9 As String: strReplace9 = "&"
Dim regEx9 As New RegExp
Dim strInput9 As String
If strPattern0 <> "" Then
strInput0 = ActiveCell.Offset(0, 0).Value
With regEx0
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern0
End With
If regEx0.Test(strInput0) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx0.Replace(strInput0, strReplace0)
Else
'MsgBox ("Not matched")
End If
End If
'ActiveCell.Offset(0, 0).Value = 2
If strPattern <> "" Then
strInput = ActiveCell.Offset(0, 0).Value
strReplace = ""
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.Test(strInput) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx.Replace(strInput, strReplace)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern1 <> "" Then
strInput1 = ActiveCell.Offset(0, 0).Value
With regEx1
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern1
End With
If regEx1.Test(strInput1) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx1.Replace(strInput1, strReplace1)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern2 <> "" Then
strInput2 = ActiveCell.Offset(0, 0).Value
With regEx2
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern2
End With
If regEx2.Test(strInput2) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx2.Replace(strInput2, strReplace2)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern3 <> "" Then
strInput3 = ActiveCell.Offset(0, 0).Value
With regEx3
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern3
End With
If regEx3.Test(strInput3) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx3.Replace(strInput3, strReplace3)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern4 <> "" Then
strInput4 = ActiveCell.Offset(0, 0).Value
With regEx4
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern4
End With
If regEx4.Test(strInput4) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx4.Replace(strInput4, strReplace4)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern5 <> "" Then
strInput5 = ActiveCell.Offset(0, 0).Value
With regEx5
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern5
End With
If regEx5.Test(strInput5) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx5.Replace(strInput5, strReplace5)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern6 <> "" Then
strInput6 = ActiveCell.Offset(0, 0).Value
With regEx6
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern6
End With
If regEx6.Test(strInput6) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx6.Replace(strInput6, strReplace6)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern7 <> "" Then
strInput7 = ActiveCell.Offset(0, 0).Value
With regEx7
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern7
End With
If regEx7.Test(strInput7) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx7.Replace(strInput7, strReplace7)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern8 <> "" Then
strInput8 = ActiveCell.Offset(0, 0).Value
With regEx8
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern8
End With
If regEx8.Test(strInput8) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx8.Replace(strInput8, strReplace8)
Else
'MsgBox ("Not matched")
End If
End If
If strPattern9 <> "" Then
strInput9 = ActiveCell.Offset(0, 0).Value
With regEx9
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern9
End With
If regEx9.Test(strInput9) Then
'MsgBox (regEx.Replace(strInput, strReplace))
ActiveCell.Offset(0, 0).Value = regEx9.Replace(strInput9, strReplace9)
Else
'MsgBox ("Not matched")
End If
End If
End Sub
所以我的解决方案是找到我不喜欢的字符串,并用我喜欢的其他字符串替换它。但正如大家都看到的那样,这种解决方案不是最优的。 例如,现在我仍然错过了解释“à”,“â”或“ô”的方法。 我需要为我想要处理的每个新字符串编辑宏。
所以我在这里寻找比我更好的解决方案,而且我相信有一个更清晰的解决方案,因为Excel可以通过复制/粘贴来解释HTML。
答案 0 :(得分:1)
如何使用HTML文件类?
让我们说你的样本字符串在Worksheets("Sheet1")
的范围A1:A6中:
Sub test()
Dim i As Integer
For i = 1 To 6 'Rows 1 to 6 hold your HTML formatted values
With CreateObject("htmlfile")
.Open
.write Worksheets("Sheet1").Range("A" & i).Value
.Close
Worksheets("Sheet1").Range("B" & i).Value = .body.outerText
End With
Next i
End Sub
答案 1 :(得分:0)
这里是Rik Sportel代码的a,,以防有人到达该页面时出现与我相同的问题:
Sub supphtml()
u = Selection.Address
For Each cel In Range(u)
With CreateObject("htmlfile")
.Open
.write cel.Value
.Close
cel.Value = .body.outerText
End With
Next cel
End Sub
这将转换在运行宏
之前选择的所有单元格