我正在使用Excel列表并希望转向:
Quercus agrifolia var。 oxyadenia (Torr。)J.T。豪威尔
进入:
<i>Quercus agrifolia</i> var. <i>oxyadenia</i> (Torr.) J.T. Howell
我有Rich Text格式列表,其中应用了格式但我想将其发送到Access,并在相关文本周围明确包含格式化标记。
答案 0 :(得分:9)
我当时想做同样的事情,并在MSDN上找到答案: Convert contents of a formatted excel cell to HTML format
我希望这对你有所帮助,它使用了一个excel宏。
编辑: 使用时我需要修改嵌套标签的代码,请在下面找到我对宏的更新:
Function fnConvert2HTML(myCell As Range) As String
Dim bldTagOn, itlTagOn, ulnTagOn, colTagOn As Boolean
Dim i, chrCount As Integer
Dim chrCol, chrLastCol, htmlTxt, htmlEnd As String
bldTagOn = False
itlTagOn = False
ulnTagOn = False
colTagOn = False
chrCol = "NONE"
'htmlTxt = "<html>"
htmlTxt = ""
chrCount = myCell.Characters.Count
For i = 1 To chrCount
htmlEnd = ""
With myCell.Characters(i, 1)
If (.Font.Color) Then
chrCol = fnGetCol(.Font.Color)
If Not colTagOn Then
htmlTxt = htmlTxt & "<font color=#" & chrCol & ">"
colTagOn = True
Else
If chrCol <> chrLastCol Then htmlTxt = htmlTxt & "</font><font color=#" & chrCol & ">"
End If
Else
chrCol = "NONE"
If colTagOn Then
htmlEnd = "</font>" & htmlEnd
'htmlTxt = htmlTxt & "</font>"
colTagOn = False
End If
End If
chrLastCol = chrCol
If .Font.Bold = True Then
If Not bldTagOn Then
htmlTxt = htmlTxt & "<b>"
bldTagOn = True
End If
Else
If bldTagOn Then
'htmlTxt = htmlTxt & "</b>"
htmlEnd = "</b>" & htmlEnd
bldTagOn = False
End If
End If
If .Font.Italic = True Then
If Not itlTagOn Then
htmlTxt = htmlTxt & "<i>"
itlTagOn = True
End If
Else
If itlTagOn Then
'htmlTxt = htmlTxt & "</i>"
htmlEnd = "</i>" & htmlEnd
itlTagOn = False
End If
End If
If .Font.Underline > 0 Then
If Not ulnTagOn Then
htmlTxt = htmlTxt & "<u>"
ulnTagOn = True
End If
Else
If ulnTagOn Then
'htmlTxt = htmlTxt & "</u>"
htmlEnd = "</u>" & htmlEnd
ulnTagOn = False
End If
End If
If (Asc(.Text) = 10) Then
htmlTxt = htmlTxt & htmlEnd & "<br>"
Else
htmlTxt = htmlTxt & htmlEnd & .Text
End If
End With
Next
If colTagOn Then
htmlTxt = htmlTxt & "</font>"
colTagOn = False
End If
If bldTagOn Then
htmlTxt = htmlTxt & "</b>"
bldTagOn = False
End If
If itlTagOn Then
htmlTxt = htmlTxt & "</i>"
itlTagOn = False
End If
If ulnTagOn Then
htmlTxt = htmlTxt & "</u>"
ulnTagOn = False
End If
'htmlTxt = htmlTxt & "</html>"
fnConvert2HTML = htmlTxt
End Function
Function fnGetCol(strCol As String) As String
Dim rVal, gVal, bVal As String
strCol = Right("000000" & Hex(strCol), 6)
bVal = Left(strCol, 2)
gVal = Mid(strCol, 3, 2)
rVal = Right(strCol, 2)
fnGetCol = rVal & gVal & bVal
End Function
答案 1 :(得分:1)
这是一个更快的替代解决方案,但是产生的输出更混乱(因为它使用Word的HTML引擎)。您需要为VBA项目添加以下引用:
然后,通过运行eg调用以下代码。 convertToHtml(Range("A1:A100"))
在立即窗口中:
' Given a temporary file path, load the HTML in that file
' and return the first paragraph's inner HTML.
Function extractFirstParagraph(filePath As String) As String
Dim fs As New FileSystemObject, _
html As New MSHTML.HTMLDocument, _
par As MSHTML.HTMLGenericElement
html.body.innerHTML = fs.OpenTextFile(filePath).ReadAll
Set par = html.getElementsByTagName("P")(0)
extractFirstParagraph = par.innerHTML
End Function
Sub convertToHtml(rng As Range)
' Open a single Word instance.
Dim w As New Word.Application, d As Word.Document
Set d = w.Documents.Add
Dim cell As Range
Const tempFile As String = "c:\temp\msword.html"
' For each cell in the range ...
For Each cell In rng
If cell.Value <> "" Then
' ... copy it into the Word document ...
cell.Copy
d.Range.PasteSpecial xlPasteFormats
' ... save the Word document as HTML
' in a temporary file ...
d.SaveAs2 tempFile, wdFormatHTML
' ... and extract the first paragraph.
cell.Value = extractFirstParagraph(tempFile)
Debug.Print "Cell " & cell.Address & " done."
End If
Next cell
' Close Word once you're done. Note that if a bug
' is encountered, this cleanup won't occur and the
' Word process will need to be killed to release
' file locks, otherwise you get an unhelpful error.
w.Quit False
End Sub
您可以通过添加对Microsoft VBScript正则表达式5.5的引用并运行如下函数来使用正则表达式清理输出:
' Used to avoid duplication in cleanWordHtml.
Function eraseInPlace(ByRef r As RegExp, _
ByRef s As String, p As String) As String
r.Pattern = p
s = r.Replace(s, "")
End Function
' Eliminate junk tags from HTML generated by Word.
Function cleanWordHtml(inputString As String)
Dim r As New RegExp
r.Global = True
eraseInPlace r, inputString, "mso-[^;""]*(; )?"
eraseInPlace r, inputString, " style="""""
eraseInPlace r, inputString, "<\?xml[^>]*>"
eraseInPlace r, inputString, "<\/?o:[^>]*>"
eraseInPlace r, inputString, "<SPAN><\/SPAN>"
cleanWordHtml = inputString
End Function
如果您需要将<span>
标签转换为<font>
标签(我也需要这样做,因为我正在导入不支持CSS的Access RT文本字段),请尝试调用此方法函数并传入在extractFirstParagraph
函数中构造的MSHTML对象:
' Given a <p> DOM node, replace any children of the
' form <span style="color: foo"> with <font color="foo">.
Function convertSpanToFont(ByRef par As MSHTML.HTMLGenericElement, _
ByRef doc As MSHTML.HTMLDocument)
Dim span As MSHTML.HTMLSpanElement, _
font As MSHTML.HTMLFontElement
For Each span In par.getElementsByTagName("span")
Set font = doc.createElement("font")
If IsNull(span.Style.Color) _
Or span.Style.Color <> "" Then
font.Color = span.Style.Color
font.innerHTML = span.innerHTML
span.insertAdjacentElement "afterEnd", font
span.removeNode True
End If
Next span
End Function
我还考虑过将整个电子表格从Excel中保存为HTML,然后使用另一种工具将其保存为Access可以处理的格式,但是Excel的HTML导出会生成CSS类,而不是内联样式。如果仅需要将电子表格的一部分转换为HTML,则此方法也很有用。
答案 2 :(得分:0)
实际上有一个使用 Excel 的内部 XML 表示并使用 XSL 样式表对其进行转换的更简洁、更快速的解决方案。
您可以在 https://github.com/HeimMatthias/Excel-HTML-Tools-Public
上找到 VBA 代码和所需的 XSLT免责声明: 在使用速度缓慢的 VBA 脚本多年后,我自己编写了这个工具。示例输出可以在此 fiddle 中看到。
有关技术细节:
Range.Value
-Property 可以接受一个参数 xlRangeValueXMLSpreadsheet
,它返回一个完全格式化的 xml-String,其中包括其内容的 html-object。
在包含
ActiveSheet.Range("A1").Value(xlRangeValueXMLSpreadsheet)
<块引用>
Quercus agrifolia 变种。 oxyadenia (Torr.) J.T.豪尔
返回以下字符串:
<?xml version="1.0"?>
<?mso-application progid="Excel.Sheet"?>
<Workbook xmlns="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:x="urn:schemas-microsoft-com:office:excel"
xmlns:ss="urn:schemas-microsoft-com:office:spreadsheet"
xmlns:html="http://www.w3.org/TR/REC-html40">
<Styles>
<Style ss:ID="Default" ss:Name="Normal">
<Alignment ss:Vertical="Bottom"/>
<Borders/>
<Font ss:FontName="Calibri" ss:Size="11" ss:Color="#000000"/>
<Interior/>
<NumberFormat/>
<Protection/>
</Style>
<Style ss:ID="s62">
<Font ss:FontName="Calibri" ss:Size="11" ss:Color="#000000"
ss:Italic="1"/>
</Style>
</Styles>
<Worksheet ss:Name="Tabelle1">
<Table ss:ExpandedColumnCount="1" ss:ExpandedRowCount="1"
ss:DefaultColumnWidth="61.714285714285708"
ss:DefaultRowHeight="14.571428571428571">
<Row>
<Cell ss:StyleID="s62"><ss:Data ss:Type="String"
xmlns="http://www.w3.org/TR/REC-html40"><I><Font html:Color="#000000">Quercus agrifolia</Font></I><Font
html:Color="#000000"> var. </Font><I><Font html:Color="#000000">oxyadenia</Font></I><Font
html:Color="#000000"> (Torr.) J.T. Howell</Font></ss:Data></Cell>
</Row>
</Table>
</Worksheet>
</Workbook>
Cell
-Tag 围绕着一个 ss:Data
-Object,其中包含 - 或多或少 - 干净的 html-Code。在任何情况下,清理这些数据以获得干净的 html 比解析每个字母并访问其样式要简单得多,速度也快得多。 (快得多,我的意思是快 100 倍)。
注意事项: 单元格样式(来自模板和单个单元格)不表示为 html。这并不总是特别明显。在上面的示例中,单元格已自动接收斜体样式,因为第一个单词已被斜体化。由于字符串中还有罗马(直立)词,斜体段落的代码在 html 中表示。但是如果整个单元格都用斜体表示,那么 <i>
标签就会丢失。这尤其令人困惑,因为如果相应的样式具有 <i>
作为属性,您不能只用 ss:Italic="1"
-Tag 包围整个 html,您还需要检查它是否在 html 中包含 -Tags -部分。这当然适用于所有单元格样式。