我尝试编写一个Excel VBA脚本,该脚本从二进制FrameMaker文件(* .fm)中获取一些信息(版本和修订日期)。
在sub打开* .fm文件后,将前25行(所需信息在前25行中)写入变量。
Sub fetchDate()
Dim fso As Object
Dim fmFile As Object
Dim fileString As String
Dim fileName As String
Dim matchPattern As String
Dim result As String
Dim i As Integer
Dim bufferString As String
Set fso = CreateObject("Scripting.FileSystemObject")
fileName = "C:\FrameMaker-file.fm"
Set fmFile = fso.OpenTextFile(fileName, ForReading, False, TristateFalse)
matchPattern = "Version - Date.+?(\d{1,2})[\s\S]*Rev.+?(\d{1,2})"
fileString = ""
i = 1
Do While i <= 25
bufferString = fmFile.ReadLine
fileString = fileString & bufferString & vbNewLine
i = i + 1
Loop
fmFile.Close
'fileString = Replace(fileString, matchPattern, "")
result = regExSearch(fileString, matchPattern)
MsgBox result
Set fso = Nothing
Set fmFile = Nothing
End Sub
正则表达式函数如下所示:
Function regExSearch(ByVal strInput As String, ByVal strPattern As String) As String
Dim regEx As New RegExp
Dim strReplace As String
Dim result As String
Dim match As Variant
Dim matches As Variant
Dim subMatch As Variant
Set regEx = CreateObject("VBScript.RegExp")
If strPattern <> "" Then
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Set matches = regEx.Execute(strPattern)
For Each match In matches
If match.SubMatches.Count > 0 Then
For Each subMatch In match.SubMatches
Debug.Print "match:" & subMatch
Next subMatch
End If
Next match
regExSearch = result
Else
regExSearch = "no match"
End If
End If
Set regEx = Nothing
End Function
问题1:
保存在变量“fileString”中的二进制* .fm文件的内容在每次运行时都有所不同,尽管* .fm文件保持不变。
以下是来自不同运行的前三行的一些示例,这些行保存在“fileString”中:
示例1
<MakerFile 12.0>
Aaÿ No.009.xxx ???? /tEXt ??????
示例2
<MakerFile 12.0>
Aaÿ ` ? ???? /tEXt ? c ? E ? ????a A ? ? ? ? ? d??????? ? Heading ????????????A???????A
正如您所看到的,示例1与示例2不同,尽管它是完全相同的VBA代码和完全相同的* .fm文件。
问题2:
“matchPattern”中的正则表达式搜索字符串随机写入我的“fileString”也是一个大问题。以下是调试控制台的屏幕截图:
这怎么可能?有任何建议或想法来解决这个问题吗?
我正在使用:
MS Office Professional Plus 2010
正则表达式的VBA参考:Microsoft VBScript正则表达式5.5
非常感谢你!
此致 安迪
/编辑2018年3月12日:
以下是* .fm文件示例:sample file 如果你用记事本打开它,你可以用纯文本看到一些信息,如“版本 - DateVersion 4 - 2018 / Feb / 07”和“Rev02 - 2018 / Feb / 21”。我想用正则表达式获取这些信息。
答案 0 :(得分:1)
我找到了使用ADODB.streams的解决方案。这很好用:
Sub test_binary()
Dim regEx As Object
Dim buffer As String
Dim filename As String
Dim matchPattern As String
Dim result As String
Set regEx = CreateObject("VBScript.RegExp")
filename = "C:\test.fm"
With CreateObject("ADODB.Stream")
.Open
.Type = 2
.Charset = "utf-8"
.LoadFromFile filename
buffer = .Readtext(10000)
.Close
End With
matchPattern = "Version - Date.+?(\d{1,2})[\s\S]*Rev.+?(\d{1,2})"
result = regExSearch(buffer, matchPattern)
MsgBox result
End Sub
正则表达式功能:
Function regExSearch(ByVal strInput As String, ByVal strPattern As String) As String
Dim regEx As New RegExp
Dim result As String
Dim match As Variant
Dim matches As Variant
Dim subMatch As Variant
Set regEx = CreateObject("VBScript.RegExp")
If strPattern <> "" Then
With regEx
.Global = True
.MultiLine = True
.IgnoreCase = False
.Pattern = strPattern
End With
If regEx.test(strInput) Then
Set matches = regEx.Execute(strInput)
result = ""
For Each match In matches
If match.SubMatches.Count > 0 Then
For Each subMatch In match.SubMatches
If Len(result) > 0 Then
result = result & "||"
End If
result = result & subMatch
Next subMatch
End If
Next match
regExSearch = result
Else
regExSearch = "err_nomatch"
End If
End If
Set regEx = Nothing
End Function
将* .fm文件作为文本文件(.Type = 2)打开并将字符集设置为&#34; utf-8&#34;非常重要。否则我的正则表达式不会有纯文本阅读。
非常感谢你带我走正确的路!
答案 1 :(得分:0)
只需将FM文件另存为MIF。 它是FM文件的文本编码,可以在不损失任何信息的情况下来回转换。