我通过Excel宏(VBScript)导出Microsoft Excel数据。 因为文件是lua脚本,我将其导出为UTF-8。 我在Excel中制作UTF-8的唯一方法就是使用adodb.stream这样的
set fileLua = CreateObject("adodb.stream")
fileLua.Type = 2
fileLua.Mode = 3
fileLua.Charset = "UTF-8"
fileLua.Open
fileLua.WriteText("test")
fileLua.SaveToFile("Test.lua")
fileLua.flush
fileLua.Close
我想从Test.lua中消除BOM,但我不知道如何。 (因为Test.lua有一些unicode文本,我必须使用UTF-8格式。)
你知道如何在excel文件中制作没有BOM的UTF-8文件吗? 提前谢谢。
答案 0 :(得分:32)
我也有同样的问题:必须将Excel(Office 2003,VBA6.5)中的数据导出为UTF-8编码文件。从您的问题中找到答案!在我的示例下面,我还使用boost(感谢!)答案中的技巧#2去除BOM。我没有#1工作,从未尝试过#3。
Sub WriteUTF8WithoutBOM()
Dim UTFStream As Object
Set UTFStream = CreateObject("adodb.stream")
UTFStream.Type = adTypeText
UTFStream.Mode = adModeReadWrite
UTFStream.Charset = "UTF-8"
UTFStream.LineSeparator = adLF
UTFStream.Open
UTFStream.WriteText "This is an unicode/UTF-8 test.", adWriteLine
UTFStream.WriteText "First set of special characters: öäåñüûú€", adWriteLine
UTFStream.WriteText "Second set of special characters: qwertzuiopõúasdfghjkléáûyxcvbnm\|Ä€Í÷×äðÐ[]í³£;?¤>#&@{}<;>*~¡^¢°²`ÿ´½¨¸0", adWriteLine
UTFStream.Position = 3 'skip BOM
Dim BinaryStream As Object
Set BinaryStream = CreateObject("adodb.stream")
BinaryStream.Type = adTypeBinary
BinaryStream.Mode = adModeReadWrite
BinaryStream.Open
'Strips BOM (first 3 bytes)
UTFStream.CopyTo BinaryStream
'UTFStream.SaveToFile "d:\adodb-stream1.txt", adSaveCreateOverWrite
UTFStream.Flush
UTFStream.Close
BinaryStream.SaveToFile "d:\adodb-stream2.txt", adSaveCreateOverWrite
BinaryStream.Flush
BinaryStream.Close
End Sub
答案 1 :(得分:9)
如果其他人正在努力使用adTypeText常量,则需要在Tools-&gt; References下包含“Microsoft ActiveX Data Objects 2.5 Object Library”。
答案 2 :(得分:7)
一些可能性:
将文本作为UTF-8,Type = 2放入缓冲区,但是然后设置Type = 1(作为二进制)并将其写出。这可能会说服ADODB.Stream跳过添加BOM。
创建另一个缓冲区,类型为binary,并使用CopyTo将数据从BOM后的某个点复制到该缓冲区。
使用Scripting.FileSystemObject再次读取文件,修剪BOM,再次写出
答案 3 :(得分:1)
修改强>
来自rellampec的评论警告我最好放弃我发现的LF已经被user272735的方法添加到文件的末尾。我最后添加了一个新版本的例程。
原帖
我成功使用user272735的方法一年,当时我发现它在文件的末尾添加了一个LF。在我做了一些非常详细的测试之前,我没有注意到这个额外的LF,所以这不是一个重要的错误。但是,我的最新版本丢弃了LF,以防它变得重要。
Public Sub PutTextFileUtf8(ByVal PathFileName As String, ByVal FileBody As String)
' Outputs FileBody as a text file (UTF-8 encoding without leading BOM)
' named PathFileName
' Needs reference to "Microsoft ActiveX Data Objects n.n Library"
' Addition to original code says version 2.5. Tested with version 6.1.
' 1Nov16 Copied from http://stackoverflow.com/a/4461250/973283
' but replaced literals with parameters.
' 15Aug17 Discovered routine was adding an LF to the end of the file.
' Added code to discard that LF.
' References: http://stackoverflow.com/a/4461250/973283
' https://www.w3schools.com/asp/ado_ref_stream.asp
Dim BinaryStream As Object
Dim UTFStream As Object
Set UTFStream = CreateObject("adodb.stream")
UTFStream.Type = adTypeText
UTFStream.Mode = adModeReadWrite
UTFStream.Charset = "UTF-8"
' The LineSeparator will be added to the end of FileBody. It is possible
' to select a different value for LineSeparator but I can find nothing to
' suggest it is possible to not add anything to the end of FileBody
UTFStream.LineSeparator = adLF
UTFStream.Open
UTFStream.WriteText FileBody, adWriteLine
UTFStream.Position = 3 'skip BOM
Set BinaryStream = CreateObject("adodb.stream")
BinaryStream.Type = adTypeBinary
BinaryStream.Mode = adModeReadWrite
BinaryStream.Open
UTFStream.CopyTo BinaryStream
' Oriinally I planned to use "CopyTo Dest, NumChars" to not copy the last
' byte. However, NumChars is described as an integer whereas Position is
' described as Long. I was concerned by "integer" they mean 16 bits.
'Debug.Print BinaryStream.Position
BinaryStream.Position = BinaryStream.Position - 1
BinaryStream.SetEOS
'Debug.Print BinaryStream.Position
UTFStream.Flush
UTFStream.Close
Set UTFStream = Nothing
BinaryStream.SaveToFile PathFileName, adSaveCreateOverWrite
BinaryStream.Flush
BinaryStream.Close
Set BinaryStream = Nothing
End Sub
新版常规
这个版本省略了丢弃最后添加的不需要的LF的代码,因为它避免了首先添加LF。我保留了原始版本,以防任何人对删除尾随字符的技术感兴趣。
Public Sub PutTextFileUtf8NoBOM(ByVal PathFileName As String, ByVal FileBody As String)
' Outputs FileBody as a text file named PathFileName using
' UTF-8 encoding without leading BOM
' Needs reference to "Microsoft ActiveX Data Objects n.n Library"
' Addition to original code says version 2.5. Tested with version 6.1.
' 1Nov16 Copied from http://stackoverflow.com/a/4461250/973283
' but replaced literals with parameters.
' 15Aug17 Discovered routine was adding an LF to the end of the file.
' Added code to discard that LF.
' 11Oct17 Posted to StackOverflow
' 9Aug18 Comment from rellampec suggested removal of adWriteLine from
' WriteTest statement would avoid adding LF.
' 30Sep18 Amended routine to remove adWriteLine from WriteTest statement
' and code to remove LF from file. Successfully tested new version.
' References: http://stackoverflow.com/a/4461250/973283
' https://www.w3schools.com/asp/ado_ref_stream.asp
Dim BinaryStream As Object
Dim UTFStream As Object
Set UTFStream = CreateObject("adodb.stream")
UTFStream.Type = adTypeText
UTFStream.Mode = adModeReadWrite
UTFStream.Charset = "UTF-8"
UTFStream.Open
UTFStream.WriteText FileBody
UTFStream.Position = 3 'skip BOM
Set BinaryStream = CreateObject("adodb.stream")
BinaryStream.Type = adTypeBinary
BinaryStream.Mode = adModeReadWrite
BinaryStream.Open
UTFStream.CopyTo BinaryStream
UTFStream.Flush
UTFStream.Close
Set UTFStream = Nothing
BinaryStream.SaveToFile PathFileName, adSaveCreateOverWrite
BinaryStream.Flush
BinaryStream.Close
Set BinaryStream = Nothing
End Sub
答案 4 :(得分:0)
您喜欢本机T-SQL而不是外部代码
DECLARE @FILE_NAME VARCHAR(255) = 'd:\utils\test.xml' --drive:\path\filename\
DECLARE @FILE_DATA VARCHAR(MAX) = '<?xml version="1.0" encoding="UTF-8"?>test</xml>' --binary as varchar(max)
DECLARE @FILE_NAME_TO VARCHAR(255) --Temp name for text stream
DECLARE @FSO_ID_TXTSTRM INT --Text Stream
DECLARE @FSO_ID_BINSTRM INT --Binary Stream
DECLARE @RC INT
EXEC @RC = sp_OACreate 'ADODB.Stream', @FSO_ID_TXTSTRM OUTPUT
EXEC @RC = sp_OASetProperty @FSO_ID_TXTSTRM, 'Type', 2 --1 = binary, 2 = text
EXEC @RC = sp_OASetProperty @FSO_ID_TXTSTRM, 'Mode', 3 --0 = not set, 1 read, 2 write, 3 read/write
EXEC @RC = sp_OASetProperty @FSO_ID_TXTSTRM, 'Charset', 'UTF-8' --'ISO-8859-1'
EXEC @RC = sp_OASetProperty @FSO_ID_TXTSTRM, 'LineSeparator', 'adLF'
EXEC @RC = sp_OAMethod @FSO_ID_TXTSTRM, 'Open'
EXEC @RC = sp_OAMethod @FSO_ID_TXTSTRM, 'WriteText', NULL, @FILE_DATA --text method
--Create binary stream
EXEC @RC = sp_OACreate 'ADODB.Stream', @FSO_ID_BINSTRM OUTPUT
EXEC @RC = sp_OASetProperty @FSO_ID_BINSTRM, 'Type', 1 --1 = binary, 2 = text
EXEC @RC = sp_OAMethod @FSO_ID_BINSTRM, 'Open'
EXEC @RC = sp_OASetProperty @FSO_ID_BINSTRM, 'Mode', 3 --0 = not set, 1 read, 2 write, 3 read/write
--Move 3 positions forward in text stream (BOM is first 3 positions)
EXEC @RC = sp_OASetProperty @FSO_ID_TXTSTRM, 'Position', 3
--Copy text stream to binary stream
EXEC @RC = sp_OAMethod @FSO_ID_TXTSTRM, 'CopyTo', NULL, @FSO_ID_BINSTRM
--Commit data and close text stream
EXEC @RC = sp_OAMethod @FSO_ID_TXTSTRM, 'Flush'
EXEC @RC = sp_OAMethod @FSO_ID_TXTSTRM, 'Close'
EXEC @RC = sp_OADestroy @FSO_ID_TXTSTRM
--Save binary stream to file and close
EXEC @RC = sp_OAMethod @FSO_ID_BINSTRM, 'SaveToFile', NULL, @FILE_NAME, 2 --1 = notexist 2 = overwrite
EXEC @RC = sp_OAMethod @FSO_ID_BINSTRM, 'Close'
EXEC @RC = sp_OADestroy @FSO_ID_BINSTRM
答案 5 :(得分:0)
这是另一个BOM处理黑客,来自与您的问题重叠的答案。
对于迟到的回答道歉 - 对于遇到字节订单标记的其他人来说更是如此 - 并且关于此问题的页面浏览量告诉我您的问题与几个相关问题相关:编写无BOM的令人惊讶的困难VBA中的文件 - 即使是一些常见的流库也会在您的输出中存储BOM,无论您是否要求它。
我说我的答案'重叠',因为下面的代码解决了一个稍微不同的问题 - 主要目的是为具有异构文件集合的文件夹编写Schema文件 - 但它是BOM清除和BOM的工作示例 - 免费文件写入正在使用中,并清楚标记相关的段。
关键功能是我们遍历文件夹中的所有'.csv'文件,我们用前四个字节的快速半字节测试每个文件:我们只承担剥离一个文件的繁重任务如果我们看到一个标记。
我们正在处理来自原始C的低级文件处理代码。我们必须一直使用字节数组,因为你在VBA中做的其他事情都会存入字节顺序标记嵌入在字符串变量的结构中。
所以,没有进一步的adodb,这是代码:
Public Sub SetSchema(strFolder As String) On Error Resume Next
' Write a Schema.ini file to the data folder.
' This is necessary if we do not have the registry privileges to set the ' correct 'ImportMixedTypes=Text' registry value, which overrides IMEX=1
' The code also checks for ANSI or UTF-8 and UTF-16 files, and applies a ' usable setting for CharacterSet ( UNICODE|ANSI ) with a horrible hack.
' OEM codepage-defined text is not supported: further coding is required
' ...And we strip out Byte Order Markers, if we see them - the OLEDB SQL ' provider for textfiles can't deal with a BOM in a UTF-16 or UTF-8 file
' Not implemented: handling tab-delimited files or other delimiters. The ' code assumes a header row with columns, specifies 'scan all rows', and ' imposes 'read the column as text' if the data types are mixed.
Dim strSchema As String Dim strFile As String Dim hndFile As Long Dim arrFile() As Byte Dim arrBytes(0 To 4) As Byte
If Right(strFolder, 1) <> "\" Then strFolder = strFolder & "\"
' Dir() is an iterator function when you call it with a wildcard:
strFile = VBA.FileSystem.Dir(strFolder & "*.csv")
Do While Len(strFile) > 0
hndFile = FreeFile Open strFolder & strFile For Binary As #hndFile Get #hndFile, , arrBytes Close #hndFile
strSchema = strSchema & "[" & strFile & "]" & vbCrLf strSchema = strSchema & "Format=CSVDelimited" & vbCrLf strSchema = strSchema & "ImportMixedTypes=Text" & vbCrLf strSchema = strSchema & "MaxScanRows=0" & vbCrLf
If arrBytes(2) = 0 Or arrBytes(3) = 0 Then ' this is a hack strSchema = strSchema & "CharacterSet=UNICODE" & vbCrLf Else strSchema = strSchema & "CharacterSet=ANSI" & vbCrLf End If
strSchema = strSchema & "ColNameHeader = True" & vbCrLf strSchema = strSchema & vbCrLf
' ***********************************************************
' BOM disposal - Byte order marks break the Access OLEDB text provider:
If arrBytes(0) = &HFE And arrBytes(1) = &HFF _ Or arrBytes(0) = &HFF And arrBytes(1) = &HFE Then
hndFile = FreeFile Open strFolder & strFile For Binary As #hndFile ReDim arrFile(0 To LOF(hndFile) - 1) Get #hndFile, , arrFile Close #hndFile
BigReplace arrFile, arrBytes(0) & arrBytes(1), ""
hndFile = FreeFile Open strFolder & strFile For Binary As #hndFile Put #hndFile, , arrFile Close #hndFile Erase arrFile
ElseIf arrBytes(0) = &HEF And arrBytes(1) = &HBB And arrBytes(2) = &HBF Then
hndFile = FreeFile Open strFolder & strFile For Binary As #hndFile ReDim arrFile(0 To LOF(hndFile) - 1) Get #hndFile, , arrFile Close #hndFile BigReplace arrFile, arrBytes(0) & arrBytes(1) & arrBytes(2), ""
hndFile = FreeFile Open strFolder & strFile For Binary As #hndFile Put #hndFile, , arrFile Close #hndFile Erase arrFile
End If
' ***********************************************************
strFile = "" strFile = Dir
Loop
If Len(strSchema) > 0 Then
strFile = strFolder & "Schema.ini"
hndFile = FreeFile Open strFile For Binary As #hndFile Put #hndFile, , strSchema Close #hndFile
End If
End Sub
Public Sub BigReplace(ByRef arrBytes() As Byte, _ ByRef SearchFor As String, _ ByRef ReplaceWith As String) On Error Resume Next
Dim varSplit As Variant
varSplit = Split(arrBytes, SearchFor) arrBytes = Join$(varSplit, ReplaceWith)
Erase varSplit
End Sub
如果您知道可以将字节数组分配给VBA.String,则代码更容易理解,反之亦然。 BigReplace()函数是一个黑客,可以回避一些VBA低效的字符串处理,尤其是分配:如果你以其他方式执行,你会发现大文件会导致严重的内存和性能问题。