我正在制定验证文件的流程,以确保它们符合公司标准。其中一个步骤是确保Word文档不使用未经批准的字体。
我有以下代码存根,它可以工作:
Dim wordApplication As Word.ApplicationClass = New Word.ApplicationClass()
Dim wordDocument As Word.Document = Nothing
Dim fontList As New List(Of String)()
Try
wordDocument = wordApplication.Documents.Open(FileName:="document Path")
'I've also tried using a for loop with an integer counter, no change in speed'
For Each c As Word.Range In wordDocument.Characters
If Not fontList.Contains(c.Font.Name) Then
fontList.Add(c.Font.Name)
End If
Next
但这是令人难以置信的慢!令人难以置信的慢= 2500字符/分钟(我用StopWatch计时)。我的大多数文件大约是6000字/ 30,000个字符(约25页)。但是有一些文件在100页中......
有更快的方法吗?我必须支持Office 2003格式文件,因此不能选择Open XML SDK。
- UPDATE -
我尝试将其作为Word宏运行(使用代码@ http://word.tips.net/Pages/T001522_Creating_a_Document_Font_List.html)并且运行速度更快(在一分钟之内)。不幸的是,出于我的目的,我不相信宏会起作用。
- 更新#2 -
我接受了Chris的建议并将文档转换为Open XML格式。然后我使用以下代码查找所有RunFonts对象并读取字体名称:
Using docP As WordprocessingDocument = WordprocessingDocument.Open(tmpPath, False)
Dim runFonts = docP.MainDocumentPart.Document.Descendants(Of RunFonts)().Select(
Function(c) If(c.Ascii.HasValue, c.Ascii.InnerText, String.Empty)).Distinct().ToList()
fontList.AddRange(runFonts)
End Using
答案 0 :(得分:6)
您可能必须支持Office 2003,但这并不意味着您必须以该格式解析它。获取Office 2003文档,暂时将其转换为DOCX文件,将其打开为ZIP文件,解析/word/fontTable.xml
文件,然后删除DOCX。
答案 1 :(得分:3)
我发现没有编码的另一种方式是: *导出文档为PDF *在adobe reader中打开它 *在adobe reader goto:文件菜单\属性,然后是字体选项卡,其中列出了文档中使用的系列字体和子字体。
答案 2 :(得分:2)
我认为这是错误的方式。我们正在寻找字体的 fact ,而不是该字体的 location 。这是一个存在主义而非位置问题。
迭代字体的速度要快得多。唯一的伎俩是,Word有时候会对空间等问题挑剔。这对我很有用
Sub FindAllFonts()
Dim lWhichFont As Long, sTempName As String, sBuffer As String
For lWhichFont = 1 To FontNames.Count
sTempName = FontNames(lWhichFont)
If FindThisFont(sTempName) Then
sBuffer = sBuffer & "Found " & sTempName & vbCrLf
Else
If FindThisFont(Replace(sTempName, " ", "")) Then
sBuffer = sBuffer & "Found " & sTempName & " *" & vbCrLf
End If
End If
Next
Documents.Add
Selection.TypeText Text:=sBuffer
End Sub
Function FindThisFont(sName As String) As Boolean
Selection.HomeKey Unit:=wdStory
Selection.Find.ClearFormatting
With Selection.Find
.Font.Name = sName
.Forward = True
.Format = True
.Execute
If .Found() Then
FindThisFont = True
Else
FindThisFont = False
End If
End With
End Function
它工作得非常快(唯一缓慢的组件是字体迭代)
(显然,它不会找到不在您系统上的字体,但是如果您正在尝试为您编写的传输内容做准备,并且某些助手程序已将Helvetica或MS Minchin放入其中,您可以找到它)
好吧,人们告诉我这不是每个人都想要的,人们想找到不在他们机器上的字体。但另一种方式仍然太慢,并且需要寻找很多不存在的东西。所以这里有一个替代方案,可以保存为rtf,并处理rtf头。Sub FIndAllFonts2()
Dim sOldName As String, sOldPath As String, sBuffer As String, sBuffer2 As String, sOut As String, sBit
Dim lCounter As Long, lPos As Long, lPos2 As Long, lStopAt As Long
Dim objPic As InlineShape, objShp As Shape
' rememer old name for reloading
sOldName = ActiveDocument.Path
'delete image to make file out small
For Each objPic In ActiveDocument.InlineShapes: objPic.Delete: Next
For Each objShp In ActiveDocument.Shapes: objShp.Delete: Next
ActiveDocument.SaveAs "tempout.rtf", wdFormatRTF
sTempFile = ActiveDocument.Path
ActiveDocument.Close
lPos2 = 1
' we only want the header, but we don't know how long the file is
'I am using a Mac, so filesystemobject not available
' if you end up having a huge header, make 2500 bigger
lStopAt = 2500
Open sTempFile For Input As #1
Do While Not EOF(1) And lPos2 < lStopAt
sBit = Input(1, #1)
sBuffer = sBuffer & sBit
lPos2 = lPos2 + 1
Loop
Close #1
'delete temp file
Kill sTempFile
' loop through header, fonts identified in the table as {\f1\
' if you have more than 100 fonts, make this bigger
' not all numbers are used
lStopAt = 100
For lCounter = 1 To lStopAt
lPos = InStr(sBuffer, "{\f" & lCounter & "\")
If lPos > 0 Then
sBuffer = Mid(sBuffer, lPos)
lPos = InStr(sBuffer, ";")
sBuffer2 = Left(sBuffer, lPos - 1)
'this is where you would look for the alternate name if you want it
lPos = InStr(sBuffer2, "{\*\falt")
If lPos > 0 Then
sBuffer2 = Left(sBuffer2, lPos - 1)
sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, " ") + 1) & " *" 'indicate it is the shorter ascii name
Else
sBuffer2 = Mid(sBuffer2, InStrRev(sBuffer2, "}") + 1)
End If
sOut = sOut & sBuffer2 & vbCrLf
End If
Next
'reopen old file
Documents.Open sOldName
Set newdoc = Documents.Add
sOut = "Fonts in use in document " & sOldName & vbCrLf & sOut
Selection.TypeText Text:=sOut
End Sub
这可以在MacBook Pro上在20秒内完成我的350页论文稿。所以它足够快有用。
答案 3 :(得分:2)
通过迭代段落,你可以加快速度。只有当段落包含混合字体时,才需要逐字符进行检查。 Name,Bold,Italic等属性有一个特殊的“indeterminate”值,它是Name的空字符串或样式属性的9999999。
因此,例如,如果Bold = 9999999,则表示该段落包含一些粗体和一些非粗体字符。
我包含以下片段以显示一般概念:
For Each P as Paragraph in doc.Paragraphs
Dim R as Range = P.Range
If R.Font.Name = "" Or R.Font.Bold = 9999999 Or R.Font.Italic = 9999999
Or R.Font.Underline = 9999999 Or R.Font.Strikethrough = 9999999 Then
' This paragraph uses mixed fonts, so we need to analyse character by character
AnalyseCharacterByCharacter(R)
Else
' R.Font is used throughout this paragraph
FontHasBeenUsed(R.Font)
End If
Next
答案 4 :(得分:1)
如果您想获取文档中使用的所有字体。你可以使用OPEN XML简单地通过一行获取所有这些:
using (WordprocessingDocument doc = WordprocessingDocument.Open(filePath, true))
{
var fontlst = doc.MainDocumentPart.FontTablePart.Fonts.Elements<Font>();
}
每个Font元素都有其“Name”属性,该属性在文本运行属性的元素中引用。
提示:你必须考虑每个单词doc。 不有多于2个字体表部分,一个在主要部分,另一个在词汇表部分,所以如果需要,不要忘记考虑一个词汇表。
您可以从here
下载OPEN XML SDK答案 5 :(得分:0)
试试这个:
Sub Word_Get_Document_Fonts()
Dim report As String
Dim J As Integer
Dim font_name As String
report = "Fonts in use in this document:" & vbCr & vbCr
For J = 1 To FontNames.Count
font_name = FontNames(J)
Set myrange = ActiveDocument.Range
myrange.Find.ClearFormatting
myrange.Find.Font.Name = font_name
With myrange.Find
.text = "^?"
.Replacement.text = ""
.Forward = True
.Wrap = wdFindStop
.Format = True
End With
myrange.Find.Execute
If myrange.Find.Found Then
report = report & font_name & vbCr
End If
Next J
MsgBox (report)
End Sub
答案 6 :(得分:0)
这可能比在使用OpenXml处理文档之前将文档转换为.docx更快(对于记录,您也可以使用属性document.Content.WordOpenXML而不是document.Content.XML):
using System;
using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Xml.Linq;
using Word = NetOffice.WordApi;
namespace _5261108
{
class Program
{
static void Main(string[] args)
{
using (var app = new Word.Application())
{
var doc = app.Documents.Open(Path.GetFullPath("test.docx"));
foreach (var font in GetFontNames(doc))
{
Console.WriteLine(font);
}
app.Quit(false);
}
Console.ReadLine();
}
private static IEnumerable<string> GetFontNames(Word.Document document)
{
var xml = document.Content.XML;
var doc = XDocument.Parse(xml);
var fonts = doc.Descendants().First(n => n.Name.LocalName == "fonts").Elements().Where(n => n.Name.LocalName == "font");
var fontNames = fonts.Select(f => f.Attributes().First(a => a.Name.LocalName == "name").Value);
return fontNames.Distinct();
}
}
}
为了您的方便而转换:
Imports System.Collections.Generic
Imports System.IO
Imports System.Linq
Imports System.Xml.Linq
Imports Word = NetOffice.WordApi
Namespace _5261108
Class Program
Private Shared Sub Main(args As String())
Using app = New Word.Application()
Dim doc = app.Documents.Open(Path.GetFullPath("test.docx"))
For Each font As var In GetFontNames(doc)
Console.WriteLine(font)
Next
app.Quit(False)
End Using
Console.ReadLine()
End Sub
Private Shared Function GetFontNames(document As Word.Document) As IEnumerable(Of String)
Dim xml = document.Content.XML
Dim doc = XDocument.Parse(xml)
Dim fonts = doc.Descendants().First(Function(n) n.Name.LocalName = "fonts").Elements().Where(Function(n) n.Name.LocalName = "font")
Dim fontNames = fonts.[Select](Function(f) f.Attributes().First(Function(a) a.Name.LocalName = "name").Value)
Return fontNames.Distinct()
End Function
End Class
End Namespace
'=======================================================
'Service provided by Telerik (www.telerik.com)
'Conversion powered by NRefactory.
'Twitter: @telerik
'Facebook: facebook.com/telerik
'=======================================================