列出Word文档使用的字体(更快的方法)

时间:2011-03-10 14:31:42

标签: vb.net ms-word office-interop

我正在制定验证文件的流程,以确保它们符合公司标准。其中一个步骤是确保Word文档不使用未经批准的字体。

我有以下代码存根,它可以工作:

    Dim wordApplication As Word.ApplicationClass = New Word.ApplicationClass()
    Dim wordDocument As Word.Document = Nothing

    Dim fontList As New List(Of String)()

    Try
        wordDocument = wordApplication.Documents.Open(FileName:="document Path")
        'I've also tried using a for loop with an integer counter, no change in speed'
        For Each c As Word.Range In wordDocument.Characters
            If Not fontList.Contains(c.Font.Name) Then
                fontList.Add(c.Font.Name)
            End If
        Next

但这是令人难以置信的慢!令人难以置信的慢= 2500字符/分钟(我用StopWatch计时)。我的大多数文件大约是6000字/ 30,000个字符(约25页)。但是有一些文件在100页中......

有更快的方法吗?我必须支持Office 2003格式文件,因此不能选择Open XML SDK。

- UPDATE -

我尝试将其作为Word宏运行(使用代码@ http://word.tips.net/Pages/T001522_Creating_a_Document_Font_List.html)并且运行速度更快(在一分钟之内)。不幸的是,出于我的目的,我不相信宏会起作用。

- 更新#2 -

我接受了Chris的建议并将文档转换为Open XML格式。然后我使用以下代码查找所有RunFonts对象并读取字体名称:

    Using docP As WordprocessingDocument = WordprocessingDocument.Open(tmpPath, False)
        Dim runFonts = docP.MainDocumentPart.Document.Descendants(Of RunFonts)().Select(
                            Function(c) If(c.Ascii.HasValue, c.Ascii.InnerText, String.Empty)).Distinct().ToList()

        fontList.AddRange(runFonts)
    End Using

7 个答案:

答案 0 :(得分:6)

您可能必须支持Office 2003,但这并不意味着您必须以该格式解析它。获取Office 2003文档,暂时将其转换为DOCX文件,将其打开为ZIP文件,解析/word/fontTable.xml文件,然后删除DOCX。

答案 1 :(得分:3)

我发现没有编码的另一种方式是: *导出文档为PDF *在adobe reader中打开它 *在adobe reader goto:文件菜单\属性,然后是字体选项卡,其中列出了文档中使用的系列字体和子字体。

答案 2 :(得分:2)

我认为这是错误的方式。我们正在寻找字体的 fact ,而不是该字体的 location 。这是一个存在主义而非位置问题。

迭代字体的速度要快得多。唯一的伎俩是,Word有时候会对空间等问题挑剔。这对我很有用

Sub FindAllFonts()
    Dim lWhichFont As Long, sTempName As String, sBuffer As String
    For lWhichFont = 1 To FontNames.Count
       sTempName = FontNames(lWhichFont)
       If FindThisFont(sTempName) Then
          sBuffer = sBuffer & "Found " & sTempName & vbCrLf
        Else
           If FindThisFont(Replace(sTempName, " ", "")) Then
              sBuffer = sBuffer & "Found " & sTempName & " *" & vbCrLf
           End If
        End If
   Next
   Documents.Add
   Selection.TypeText Text:=sBuffer
End Sub

Function FindThisFont(sName As String) As Boolean
   Selection.HomeKey Unit:=wdStory
   Selection.Find.ClearFormatting
   With Selection.Find
       .Font.Name = sName
       .Forward = True
       .Format = True
       .Execute
       If .Found() Then
          FindThisFont = True
       Else
          FindThisFont = False
       End If
   End With
End Function

它工作得非常快(唯一缓慢的组件是字体迭代)

(显然,它不会找到不在您系统上的字体,但是如果您正在尝试为您编写的传输内容做准备,并且某些助手程序已将Helvetica或MS Minchin放入其中,您可以找到它)

好吧,人们告诉我这不是每个人都想要的,人们想找到不在他们机器上的字体。但另一种方式仍然太慢,并且需要寻找很多不存在的东西。所以这里有一个替代方案,可以保存为rtf,并处理rtf头。

Sub FIndAllFonts2()
    Dim sOldName As String, sOldPath As String, sBuffer As String, sBuffer2 As String, sOut As String, sBit
    Dim lCounter As Long, lPos As Long, lPos2 As Long, lStopAt As Long
    Dim objPic As InlineShape, objShp As Shape
    ' rememer old name for reloading
    sOldName = ActiveDocument.Path
    'delete image to make file out small
    For Each objPic In ActiveDocument.InlineShapes: objPic.Delete: Next
    For Each objShp In ActiveDocument.Shapes: objShp.Delete: Next
    ActiveDocument.SaveAs "tempout.rtf", wdFormatRTF
    sTempFile = ActiveDocument.Path
    ActiveDocument.Close
    lPos2 = 1
    ' we only want the header, but we don't know how long the file is
    'I am using a Mac, so filesystemobject not available
    ' if you end up having a huge header, make 2500 bigger
    lStopAt = 2500
    Open sTempFile For Input As #1
    Do While Not EOF(1) And lPos2 < lStopAt
        sBit = Input(1, #1)
        sBuffer = sBuffer & sBit
        lPos2 = lPos2 + 1
    Loop
    Close #1
    'delete temp file
    Kill sTempFile
    ' loop through header, fonts identified in the table as {\f1\
    ' if you have more than 100 fonts, make this bigger
    ' not all numbers are used
    lStopAt = 100
    For lCounter = 1 To lStopAt
        lPos = InStr(sBuffer, "{\f" & lCounter & "\")
        If lPos > 0 Then
            sBuffer = Mid(sBuffer, lPos)
            lPos = InStr(sBuffer, ";")
            sBuffer2 = Left(sBuffer, lPos - 1)
            'this is where you would look for the alternate name if you want it
            lPos = InStr(sBuffer2, "{\*\falt")
            If lPos > 0 Then
                sBuffer2 = Left(sBuffer2, lPos - 1)
                sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, " ") + 1) & " *" 'indicate it is the shorter ascii name
            Else
                sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, "}") + 1)
            End If
            sOut = sOut & sBuffer2 & vbCrLf
        End If
    Next
    'reopen old file
    Documents.Open sOldName
    Set newdoc = Documents.Add
    sOut = "Fonts in use in document " & sOldName & vbCrLf & sOut
    Selection.TypeText Text:=sOut
End Sub

这可以在MacBook Pro上在20秒内完成我的350页论文稿。所以它足够快有用。

答案 3 :(得分:2)

通过迭代段落,你可以加快速度。只有当段落包含混合字体时,才需要逐字符进行检查。 Name,Bold,Italic等属性有一个特殊的“indeterminate”值,它是Name的空字符串或样式属性的9999999。

因此,例如,如果Bold = 9999999,则表示该段落包含一些粗体和一些非粗体字符。

我包含以下片段以显示一般概念:

For Each P as Paragraph in doc.Paragraphs
    Dim R as Range = P.Range
    If R.Font.Name = "" Or R.Font.Bold = 9999999 Or R.Font.Italic = 9999999
        Or R.Font.Underline = 9999999 Or R.Font.Strikethrough = 9999999 Then
        ' This paragraph uses mixed fonts, so we need to analyse character by character
        AnalyseCharacterByCharacter(R)
    Else
        ' R.Font is used throughout this paragraph
        FontHasBeenUsed(R.Font)
    End If
 Next

答案 4 :(得分:1)

如果您想获取文档中使用的所有字体。你可以使用OPEN XML简单地通过一行获取所有这些:

 using (WordprocessingDocument doc = WordprocessingDocument.Open(filePath, true))
 {
     var fontlst = doc.MainDocumentPart.FontTablePart.Fonts.Elements<Font>();
 }

每个Font元素都有其“Name”属性,该属性在文本运行属性的元素中引用。

提示:你必须考虑每个单词doc。 有多于2个字体表部分,一个在主要部分,另一个在词汇表部分,所以如果需要,不要忘记考虑一个词汇表。

您可以从here

下载OPEN XML SDK

答案 5 :(得分:0)

试试这个:

Sub Word_Get_Document_Fonts()
  Dim report As String
  Dim J As Integer
  Dim font_name As String
  report = "Fonts in use in this document:" & vbCr & vbCr
  For J = 1 To FontNames.Count
    font_name = FontNames(J)
    Set myrange = ActiveDocument.Range
    myrange.Find.ClearFormatting
    myrange.Find.Font.Name = font_name
    With myrange.Find
      .text = "^?"
      .Replacement.text = ""
      .Forward = True
      .Wrap = wdFindStop
      .Format = True
    End With
    myrange.Find.Execute
    If myrange.Find.Found Then
      report = report & font_name & vbCr
    End If
  Next J
  MsgBox (report)
End Sub

答案 6 :(得分:0)

这可能比在使用OpenXml处理文档之前将文档转换为.docx更快(对于记录,您也可以使用属性document.Content.WordOpenXML而不是document.Content.XML):

using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Xml.Linq;
using Word = NetOffice.WordApi;

namespace _5261108
{
    class Program
    {
        static void Main(string[] args)
        {
            using (var app = new Word.Application())
            {
                var doc = app.Documents.Open(Path.GetFullPath("test.docx"));

                foreach (var font in GetFontNames(doc))
                {
                    Console.WriteLine(font);
                }

                app.Quit(false);
            }

            Console.ReadLine();
        }

        private static IEnumerable<string> GetFontNames(Word.Document document)
        {
            var xml = document.Content.XML;
            var doc = XDocument.Parse(xml);
            var fonts = doc.Descendants().First(n => n.Name.LocalName == "fonts").Elements().Where(n => n.Name.LocalName == "font");
            var fontNames = fonts.Select(f => f.Attributes().First(a => a.Name.LocalName == "name").Value);
            return fontNames.Distinct();
        }
    }
}

为了您的方便而转换:

Imports System.Collections.Generic
Imports System.IO
Imports System.Linq
Imports System.Xml.Linq
Imports Word = NetOffice.WordApi

Namespace _5261108
    Class Program
        Private Shared Sub Main(args As String())
            Using app = New Word.Application()
                Dim doc = app.Documents.Open(Path.GetFullPath("test.docx"))

                For Each font As var In GetFontNames(doc)
                    Console.WriteLine(font)
                Next

                app.Quit(False)
            End Using

            Console.ReadLine()
        End Sub

        Private Shared Function GetFontNames(document As Word.Document) As IEnumerable(Of String)
            Dim xml = document.Content.XML
            Dim doc = XDocument.Parse(xml)
            Dim fonts = doc.Descendants().First(Function(n) n.Name.LocalName = "fonts").Elements().Where(Function(n) n.Name.LocalName = "font")
            Dim fontNames = fonts.[Select](Function(f) f.Attributes().First(Function(a) a.Name.LocalName = "name").Value)
            Return fontNames.Distinct()
        End Function
    End Class
End Namespace

'=======================================================
'Service provided by Telerik (www.telerik.com)
'Conversion powered by NRefactory.
'Twitter: @telerik
'Facebook: facebook.com/telerik
'=======================================================