我可以在没有BOM的情况下使用UTF-8导出Excel数据吗?

时间:2010-11-10 10:36:34

标签: excel vbscript utf-8

我通过Excel宏(VBScript)导出Microsoft Excel数据。 因为文件是lua脚本,我将其导出为UTF-8。 我在Excel中制作UTF-8的唯一方法就是使用adodb.stream这样的

set fileLua = CreateObject("adodb.stream")
fileLua.Type = 2
fileLua.Mode = 3
fileLua.Charset = "UTF-8"
fileLua.Open
fileLua.WriteText("test")
fileLua.SaveToFile("Test.lua")
fileLua.flush
fileLua.Close

我想从Test.lua中消除BOM,但我不知道如何。 (因为Test.lua有一些unicode文本,我必须使用UTF-8格式。)

你知道如何在excel文件中制作没有BOM的UTF-8文件吗? 提前谢谢。

6 个答案:

答案 0 :(得分:32)

我也有同样的问题:必须将Excel(Office 2003,VBA6.5)中的数据导出为UTF-8编码文件。从您的问题中找到答案!在我的示例下面,我还使用boost(感谢!)答案中的技巧#2去除BOM。我没有#1工作,从未尝试过#3。

Sub WriteUTF8WithoutBOM()
    Dim UTFStream As Object
    Set UTFStream = CreateObject("adodb.stream")
    UTFStream.Type = adTypeText
    UTFStream.Mode = adModeReadWrite
    UTFStream.Charset = "UTF-8"
    UTFStream.LineSeparator = adLF
    UTFStream.Open
    UTFStream.WriteText "This is an unicode/UTF-8 test.", adWriteLine
    UTFStream.WriteText "First set of special characters: öäåñüûú€", adWriteLine
    UTFStream.WriteText "Second set of special characters: qwertzuiopõúasdfghjkléáûyxcvbnm\|Ä€Í÷×äðÐ[]í³£;?¤>#&@{}<;>*~¡^¢°²`ÿ´½¨¸0", adWriteLine

    UTFStream.Position = 3 'skip BOM

    Dim BinaryStream As Object
    Set BinaryStream = CreateObject("adodb.stream")
    BinaryStream.Type = adTypeBinary
    BinaryStream.Mode = adModeReadWrite
    BinaryStream.Open

    'Strips BOM (first 3 bytes)
    UTFStream.CopyTo BinaryStream

    'UTFStream.SaveToFile "d:\adodb-stream1.txt", adSaveCreateOverWrite
    UTFStream.Flush
    UTFStream.Close

    BinaryStream.SaveToFile "d:\adodb-stream2.txt", adSaveCreateOverWrite
    BinaryStream.Flush
    BinaryStream.Close
End Sub

我使用的ADO Stream Object reference

答案 1 :(得分:9)

如果其他人正在努力使用adTypeText常量,则需要在Tools-&gt; References下包含“Microsoft ActiveX Data Objects 2.5 Object Library”。

答案 2 :(得分:7)

一些可能性:

  1. 将文本作为UTF-8,Type = 2放入缓冲区,但是然后设置Type = 1(作为二进制)并将其写出。这可能会说服ADODB.Stream跳过添加BOM。

  2. 创建另一个缓冲区,类型为binary,并使用CopyTo将数据从BOM后的某个点复制到该缓冲区。

  3. 使用Scripting.FileSystemObject再次读取文件,修剪BOM,再次写出

答案 3 :(得分:1)

修改

来自rellampec的评论警告我最好放弃我发现的LF已经被user272735的方法添加到文件的末尾。我最后添加了一个新版本的例程。

原帖

我成功使用user272735的方法一年,当时我发现它在文件的末尾添加了一个LF。在我做了一些非常详细的测试之前,我没有注意到这个额外的LF,所以这不是一个重要的错误。但是,我的最新版本丢弃了LF,以防它变得重要。

Public Sub PutTextFileUtf8(ByVal PathFileName As String, ByVal FileBody As String)

  ' Outputs FileBody as a text file (UTF-8 encoding without leading BOM)
  ' named PathFileName

  ' Needs reference to "Microsoft ActiveX Data Objects n.n Library"
  ' Addition to original code says version 2.5. Tested with version 6.1.

  '  1Nov16  Copied from http://stackoverflow.com/a/4461250/973283
  '          but replaced literals with parameters.
  ' 15Aug17  Discovered routine was adding an LF to the end of the file.
  '          Added code to discard that LF.

  ' References: http://stackoverflow.com/a/4461250/973283
  '             https://www.w3schools.com/asp/ado_ref_stream.asp

  Dim BinaryStream As Object
  Dim UTFStream As Object

  Set UTFStream = CreateObject("adodb.stream")

  UTFStream.Type = adTypeText
  UTFStream.Mode = adModeReadWrite
  UTFStream.Charset = "UTF-8"
  ' The LineSeparator will be added to the end of FileBody. It is possible
  ' to select a different value for LineSeparator but I can find nothing to
  ' suggest it is possible to not add anything to the end of FileBody
  UTFStream.LineSeparator = adLF
  UTFStream.Open
  UTFStream.WriteText FileBody, adWriteLine

  UTFStream.Position = 3 'skip BOM

  Set BinaryStream = CreateObject("adodb.stream")
  BinaryStream.Type = adTypeBinary
  BinaryStream.Mode = adModeReadWrite
  BinaryStream.Open

  UTFStream.CopyTo BinaryStream

  ' Oriinally I planned to use "CopyTo Dest, NumChars" to not copy the last
  ' byte.  However, NumChars is described as an integer whereas Position is
  ' described as Long. I was concerned by "integer" they mean 16 bits.
  'Debug.Print BinaryStream.Position
  BinaryStream.Position = BinaryStream.Position - 1
  BinaryStream.SetEOS
  'Debug.Print BinaryStream.Position

  UTFStream.Flush
  UTFStream.Close
  Set UTFStream = Nothing

  BinaryStream.SaveToFile PathFileName, adSaveCreateOverWrite
  BinaryStream.Flush
  BinaryStream.Close
  Set BinaryStream = Nothing

End Sub

新版常规

这个版本省略了丢弃最后添加的不需要的LF的代码,因为它避免了首先添加LF。我保留了原始版本,以防任何人对删除尾随字符的技术感兴趣。

Public Sub PutTextFileUtf8NoBOM(ByVal PathFileName As String, ByVal FileBody As String)

  ' Outputs FileBody as a text file named PathFileName using
  ' UTF-8 encoding without leading BOM

  ' Needs reference to "Microsoft ActiveX Data Objects n.n Library"
  ' Addition to original code says version 2.5. Tested with version 6.1.

  '  1Nov16  Copied from http://stackoverflow.com/a/4461250/973283
  '          but replaced literals with parameters.
  ' 15Aug17  Discovered routine was adding an LF to the end of the file.
  '          Added code to discard that LF.
  ' 11Oct17  Posted to StackOverflow
  '  9Aug18  Comment from rellampec suggested removal of adWriteLine from
  '          WriteTest statement would avoid adding LF.
  ' 30Sep18  Amended routine to remove adWriteLine from WriteTest statement
  '          and code to remove LF from file. Successfully tested new version.

  ' References: http://stackoverflow.com/a/4461250/973283
  '             https://www.w3schools.com/asp/ado_ref_stream.asp

  Dim BinaryStream As Object
  Dim UTFStream As Object

  Set UTFStream = CreateObject("adodb.stream")

  UTFStream.Type = adTypeText
  UTFStream.Mode = adModeReadWrite
  UTFStream.Charset = "UTF-8"
  UTFStream.Open
  UTFStream.WriteText FileBody

  UTFStream.Position = 3 'skip BOM

  Set BinaryStream = CreateObject("adodb.stream")
  BinaryStream.Type = adTypeBinary
  BinaryStream.Mode = adModeReadWrite
  BinaryStream.Open

  UTFStream.CopyTo BinaryStream

  UTFStream.Flush
  UTFStream.Close
  Set UTFStream = Nothing

  BinaryStream.SaveToFile PathFileName, adSaveCreateOverWrite
  BinaryStream.Flush
  BinaryStream.Close
  Set BinaryStream = Nothing

End Sub

答案 4 :(得分:0)

您喜欢本机T-SQL而不是外部代码

DECLARE @FILE_NAME              VARCHAR(255)    = 'd:\utils\test.xml'       --drive:\path\filename\
DECLARE @FILE_DATA              VARCHAR(MAX)    = '<?xml version="1.0" encoding="UTF-8"?>test</xml>'            --binary as varchar(max)

DECLARE @FILE_NAME_TO           VARCHAR(255)                        --Temp name for text stream
DECLARE @FSO_ID_TXTSTRM         INT                                 --Text Stream
DECLARE @FSO_ID_BINSTRM         INT                                 --Binary Stream
DECLARE @RC                     INT 

EXEC @RC = sp_OACreate 'ADODB.Stream',  @FSO_ID_TXTSTRM OUTPUT
EXEC @RC = sp_OASetProperty             @FSO_ID_TXTSTRM,    'Type',             2                           --1 = binary, 2 = text
EXEC @RC = sp_OASetProperty             @FSO_ID_TXTSTRM,    'Mode',             3                           --0 = not set, 1 read, 2 write, 3 read/write
EXEC @RC = sp_OASetProperty             @FSO_ID_TXTSTRM,    'Charset',          'UTF-8'                     --'ISO-8859-1'
EXEC @RC = sp_OASetProperty             @FSO_ID_TXTSTRM,    'LineSeparator',    'adLF'
EXEC @RC = sp_OAMethod                  @FSO_ID_TXTSTRM,    'Open'  
EXEC @RC = sp_OAMethod                  @FSO_ID_TXTSTRM,    'WriteText',        NULL,       @FILE_DATA      --text method

--Create binary stream
EXEC @RC = sp_OACreate 'ADODB.Stream',  @FSO_ID_BINSTRM OUTPUT
EXEC @RC = sp_OASetProperty             @FSO_ID_BINSTRM,    'Type',             1                           --1 = binary, 2 = text
EXEC @RC = sp_OAMethod                  @FSO_ID_BINSTRM,    'Open'
EXEC @RC = sp_OASetProperty             @FSO_ID_BINSTRM,    'Mode',             3                           --0 = not set, 1 read, 2 write, 3 read/write    

--Move 3 positions forward in text stream (BOM is first 3 positions)
EXEC @RC = sp_OASetProperty             @FSO_ID_TXTSTRM,    'Position',         3

--Copy text stream to binary stream
EXEC @RC = sp_OAMethod                  @FSO_ID_TXTSTRM,    'CopyTo',           NULL,       @FSO_ID_BINSTRM

--Commit data and close text stream
EXEC @RC = sp_OAMethod                  @FSO_ID_TXTSTRM,    'Flush'
EXEC @RC = sp_OAMethod                  @FSO_ID_TXTSTRM,    'Close'
EXEC @RC = sp_OADestroy                 @FSO_ID_TXTSTRM

--Save binary stream to file and close
EXEC @RC = sp_OAMethod                  @FSO_ID_BINSTRM,    'SaveToFile',       NULL,       @FILE_NAME, 2   --1 = notexist 2 = overwrite
EXEC @RC = sp_OAMethod                  @FSO_ID_BINSTRM,    'Close'
EXEC @RC = sp_OADestroy                 @FSO_ID_BINSTRM

答案 5 :(得分:0)

这是另一个BOM处理黑客,来自与您的问题重叠的答案。

对于迟到的回答道歉 - 对于遇到字节订单标记的其他人来说更是如此 - 并且关于此问题的页面浏览量告诉我您的问题与几个相关问题相关:编写无BOM的令人惊讶的困难VBA中的文件 - 即使是一些常见的流库也会在您的输出中存储BOM,无论您是否要求它。

我说我的答案'重叠',因为下面的代码解决了一个稍微不同的问题 - 主要目的是为具有异构文件集合的文件夹编写Schema文件 - 但它是BOM清除和BOM的工作示例 - 免费文件写入正在使用中,并清楚标记相关的段。

关键功能是我们遍历文件夹中的所有'.csv'文件,我们用前四个字节的快速半字节测试每个文件:我们只承担剥离一个文件的繁重任务如果我们看到一个标记。

我们正在处理来自原始C的低级文件处理代码。我们必须一直使用字节数组,因为你在VBA中做的其他事情都会存入字节顺序标记嵌入在字符串变量的结构中

所以,没有进一步的adodb,这是代码:

BOM-处理schema.ini文件中文本文件的代码:

Public Sub SetSchema(strFolder As String)
On Error Resume Next 
' Write a Schema.ini file to the data folder.
' This is necessary if we do not have the registry privileges to set the ' correct 'ImportMixedTypes=Text' registry value, which overrides IMEX=1
' The code also checks for ANSI or UTF-8 and UTF-16 files, and applies a ' usable setting for CharacterSet ( UNICODE|ANSI ) with a horrible hack.
' OEM codepage-defined text is not supported: further coding is required
' ...And we strip out Byte Order Markers, if we see them - the OLEDB SQL ' provider for textfiles can't deal with a BOM in a UTF-16 or UTF-8 file
' Not implemented: handling tab-delimited files or other delimiters. The ' code assumes a header row with columns, specifies 'scan all rows', and ' imposes 'read the column as text' if the data types are mixed.
Dim strSchema As String Dim strFile As String Dim hndFile As Long Dim arrFile() As Byte Dim arrBytes(0 To 4) As Byte
If Right(strFolder, 1) <> "\" Then strFolder = strFolder & "\"
' Dir() is an iterator function when you call it with a wildcard:
strFile = VBA.FileSystem.Dir(strFolder & "*.csv")
Do While Len(strFile) > 0
hndFile = FreeFile Open strFolder & strFile For Binary As #hndFile Get #hndFile, , arrBytes Close #hndFile
strSchema = strSchema & "[" & strFile & "]" & vbCrLf strSchema = strSchema & "Format=CSVDelimited" & vbCrLf strSchema = strSchema & "ImportMixedTypes=Text" & vbCrLf strSchema = strSchema & "MaxScanRows=0" & vbCrLf
If arrBytes(2) = 0 Or arrBytes(3) = 0 Then ' this is a hack strSchema = strSchema & "CharacterSet=UNICODE" & vbCrLf Else strSchema = strSchema & "CharacterSet=ANSI" & vbCrLf End If
strSchema = strSchema & "ColNameHeader = True" & vbCrLf strSchema = strSchema & vbCrLf
' ***********************************************************
' BOM disposal - Byte order marks break the Access OLEDB text provider:
If arrBytes(0) = &HFE And arrBytes(1) = &HFF _ Or arrBytes(0) = &HFF And arrBytes(1) = &HFE Then
hndFile = FreeFile Open strFolder & strFile For Binary As #hndFile ReDim arrFile(0 To LOF(hndFile) - 1) Get #hndFile, , arrFile Close #hndFile
BigReplace arrFile, arrBytes(0) & arrBytes(1), ""
hndFile = FreeFile Open strFolder & strFile For Binary As #hndFile Put #hndFile, , arrFile Close #hndFile Erase arrFile
ElseIf arrBytes(0) = &HEF And arrBytes(1) = &HBB And arrBytes(2) = &HBF Then
hndFile = FreeFile Open strFolder & strFile For Binary As #hndFile ReDim arrFile(0 To LOF(hndFile) - 1) Get #hndFile, , arrFile Close #hndFile BigReplace arrFile, arrBytes(0) & arrBytes(1) & arrBytes(2), ""
hndFile = FreeFile Open strFolder & strFile For Binary As #hndFile Put #hndFile, , arrFile Close #hndFile Erase arrFile
End If
' ***********************************************************

strFile = "" strFile = Dir
Loop
If Len(strSchema) > 0 Then
strFile = strFolder & "Schema.ini"
hndFile = FreeFile Open strFile For Binary As #hndFile Put #hndFile, , strSchema Close #hndFile
End If

End Sub

Public Sub BigReplace(ByRef arrBytes() As Byte, _ ByRef SearchFor As String, _ ByRef ReplaceWith As String) On Error Resume Next
Dim varSplit As Variant
varSplit = Split(arrBytes, SearchFor) arrBytes = Join$(varSplit, ReplaceWith)
Erase varSplit
End Sub

如果您知道可以将字节数组分配给VBA.String,则代码更容易理解,反之亦然。 BigReplace()函数是一个黑客,可以回避一些VBA低效的字符串处理,尤其是分配:如果你以其他方式执行,你会发现大文件会导致严重的内存和性能问题。