Question

以下是我的情况：人们一直在Word中制作文档并将文本粘贴到FreeTextBox控件中。然后将此文本保存到SQL表中，稍后显示在屏幕上。在此过程的桌面版本中，将显示数据，其中包含用户创建的所有格式（表格，样式等）。现在，此过程已移至Web应用程序 - 当数据仍然保存并显示在屏幕上时，表单元格边框等内容不再存在。我注意到Word在这些单元格上使用的属性以 mso -some-style-prop为前缀：某些属性;

MSO-任何东西都不是有效的属性。如何让CSS在Web项目中的用户控件上正确应用？我找了一个没有运气的扩展，我试图创建一个替换这些属性的方法，但它非常繁琐且耗时。因此，在我走下那条路之前，我想与SO社区核实，看看是否有任何东西可以帮助缓解这些挣扎。提前谢谢。

Microsoft样式的示例：

echo -n "www.blah.com/012345/moreblah.html" | perl -pe "s/.*([0-9]+).*/\1/g"
5

Answer 1

我想与大家分享解决方案。我最终找到了这个链接：http://blog.codinghorror.com/cleaning-words-nasty-html/

很好地照顾了这份工作。我对其中一个正则表达式有问题，但它起了一个很好的起点。希望这会对某人有所帮助。

这就是我使用的：

Public Module Parser

    Public Function Main(text As String) As String

        Dim html As String = ""
        'Console.WriteLine("input html is " + text.Length + " chars")
        html = CleanWordHtml(text)
        html = FixEntities(text)
        Return html
    End Function

    Private Function CleanWordHtml(html As String) As String


        Dim sc As New StringCollection()
        ' get rid of unnecessary tag spans (comments and title)
        sc.Add("<!--(w|W)+?-->")
        sc.Add("<title>(w|W)+?</title>")
        ' Get rid of classes and styles
        sc.Add("s?class=w+")
        sc.Add("s+style='[^']+'")
        ' Get rid of unnecessary tags - I was unable to get this regular expression to work working
        'sc.Add("<(meta|link|/?o:|/?style|/?div|/?std|/?head|/?html|body|/?body|/?span|![)[^>]*?>")
        'sc.Add("<(meta|link|/?o:|/?style|/?div|/?std|/?head|/?html|body|/?body|/?span|![)[^>]*?>")
        ' Get rid of empty paragraph tags
        sc.Add("(<[^>]+>)+&nbsp;(</w+>)+")
        ' remove bizarre v: element attached to <img> tag
        sc.Add("s+v:w+=""[^""]+""")
        ' remove extra lines
        sc.Add("(nr){2,}")
        For Each s As String In sc
            html = Regex.Replace(html, s, "", RegexOptions.IgnoreCase)
        Next
        Return html
    End Function
    Private Function FixEntities(html As String) As String
        Dim nvc As New NameValueCollection()
        nvc.Add("""", "&ldquo;")
        nvc.Add("""", "&rdquo;")
        nvc.Add("Ã¢â‚¬â€œ", "&mdash;")
        For Each key As String In nvc.Keys
            html = html.Replace(key, nvc(key))
        Next
        Return html
    End Function
End Module

Web项目ASP.NET中的mso类的CSS

1 个答案: