转义非ASCII字符(或如何删除BOM?)

时间:2010-02-23 11:21:36

标签: ms-access vba encoding

我需要从Access记录集创建一个ANSI文本文件,该记录集输出到JSON和YAML。我可以写文件,但输出是原始字符,我需要逃避它们。例如,变音符号O(ö)应为“\ u00f6”。

我认为将文件编码为UTF-8会起作用,但事实并非如此。但是,再次查看文件编码,如果您编写“没有BOM的UTF-8”,那么一切正常。

有谁知道如何

a)在没有BOM的情况下将文本写为UTF-8,或者 b)用ANSI写入但是转义非ASCII字符?

Public Sub testoutput()

Set db = CurrentDb()

str_filename = "anothertest.json"
MyFile = CurrentProject.Path & "\" & str_filename
str_temp = "Hello world here is an ö"

fnum = FreeFile

Open MyFile For Output As fnum
Print #fnum, str_temp
Close #fnum

End Sub

2 个答案:

答案 0 :(得分:6)

...确定....我找到了一些关于如何删除BOM的示例代码。我本以为在实际编写文本时可以更优雅地做到这一点。没关系。以下代码删除了BOM。

(这最初由Simon Pedersen在http://www.imagemagick.org/discourse-server/viewtopic.php?f=8&t=12705发布)

' Removes the Byte Order Mark - BOM from a text file with UTF-8 encoding
' The BOM defines that the file was stored with an UTF-8 encoding.

Public Function RemoveBOM(filePath)

    ' Create a reader and a writer
            Dim writer, reader, fileSize
            Set writer = CreateObject("Adodb.Stream")
            Set reader = CreateObject("Adodb.Stream")

    ' Load from the text file we just wrote
            reader.Open
            reader.LoadFromFile filePath

    ' Copy all data from reader to writer, except the BOM
            writer.Mode = 3
            writer.Type = 1
            writer.Open
            reader.Position = 5
            reader.CopyTo writer, -1

    ' Overwrite file
            writer.SaveToFile filePath, 2

    ' Return file name
            RemoveBOM = filePath

    ' Kill objects
            Set writer = Nothing
            Set reader = Nothing
    End Function

对其他人可能有用。

答案 1 :(得分:1)

这里游戏的后期,但我不能成为唯一一个厌倦了我的SQL导入的编码器被带有字节顺序标记的文本文件打破了。很少有关于这个问题的Stack问题 - 这是最接近的问题 - 所以我在这里发布重叠的答案。

我说'重叠'因为下面的代码解决了一个稍微不同的问题 - 主要目的是为具有异构文件集合的文件夹编写Schema文件 - 但BOM处理段明确标记。

关键功能是我们遍历所有' .csv'文件夹中的文件,我们用前四个字节的快速半字节测试每个文件:如果我们看到一个,我们只会删除字节顺序标记。

之后,我们正在处理来自原始C的低级文件处理代码。我们必须一直使用字节数组,因为你在VBA中做的其他事情都将存放嵌入字符串变量结构中的字节顺序标记。

所以,没有进一步的adodb,这里是代码:

BOM-处理schema.ini文件中文本文件的代码:

Public Sub SetSchema(strFolder As String)
On Error Resume Next 
' Write a Schema.ini file to the data folder.
' This is necessary if we do not have the registry privileges to set the ' correct 'ImportMixedTypes=Text' registry value, which overrides IMEX=1
' The code also checks for ANSI or UTF-8 and UTF-16 files, and applies a ' usable setting for CharacterSet ( UNICODE|ANSI ) with a horrible hack.
' OEM codepage-defined text is not supported: further coding is required
' ...And we strip out Byte Order Markers, if we see them - the OLEDB SQL ' provider for textfiles can't deal with a BOM in a UTF-16 or UTF-8 file
' Not implemented: handling tab-delimited files or other delimiters. The ' code assumes a header row with columns, specifies 'scan all rows', and ' imposes 'read the column as text' if the data types are mixed.
Dim strSchema As String Dim strFile As String Dim hndFile As Long Dim arrFile() As Byte Dim arrBytes(0 To 4) As Byte
If Right(strFolder, 1) <> "\" Then strFolder = strFolder & "\"
' Dir() is an iterator function when you call it with a wildcard:
strFile = VBA.FileSystem.Dir(strFolder & "*.csv")
Do While Len(strFile) > 0
hndFile = FreeFile Open strFolder & strFile For Binary As #hndFile Get #hndFile, , arrBytes Close #hndFile
strSchema = strSchema & "[" & strFile & "]" & vbCrLf strSchema = strSchema & "Format=CSVDelimited" & vbCrLf strSchema = strSchema & "ImportMixedTypes=Text" & vbCrLf strSchema = strSchema & "MaxScanRows=0" & vbCrLf
If arrBytes(2) = 0 Or arrBytes(3) = 0 Then ' this is a hack strSchema = strSchema & "CharacterSet=UNICODE" & vbCrLf Else strSchema = strSchema & "CharacterSet=ANSI" & vbCrLf End If
strSchema = strSchema & "ColNameHeader = True" & vbCrLf strSchema = strSchema & vbCrLf

' BOM disposal - Byte order marks confuse OLEDB text drivers:
If arrBytes(0) = &HFE And arrBytes(1) = &HFF _ Or arrBytes(0) = &HFF And arrBytes(1) = &HFE Then
hndFile = FreeFile Open strFolder & strFile For Binary As #hndFile ReDim arrFile(0 To LOF(hndFile) - 1) Get #hndFile, , arrFile Close #hndFile
BigReplace arrFile, arrBytes(0) & arrBytes(1), ""
hndFile = FreeFile Open strFolder & strFile For Binary As #hndFile Put #hndFile, , arrFile Close #hndFile Erase arrFile
ElseIf arrBytes(0) = &HEF And arrBytes(1) = &HBB And arrBytes(2) = &HBF Then
hndFile = FreeFile Open strFolder & strFile For Binary As #hndFile ReDim arrFile(0 To LOF(hndFile) - 1) Get #hndFile, , arrFile Close #hndFile BigReplace arrFile, arrBytes(0) & arrBytes(1) & arrBytes(2), ""
hndFile = FreeFile Open strFolder & strFile For Binary As #hndFile Put #hndFile, , arrFile Close #hndFile Erase arrFile
End If

strFile = "" strFile = Dir
Loop
If Len(strSchema) > 0 Then
strFile = strFolder & "Schema.ini"
hndFile = FreeFile Open strFile For Binary As #hndFile Put #hndFile, , strSchema Close #hndFile
End If

End Sub

Public Sub BigReplace(ByRef arrBytes() As Byte, ByRef SearchFor As String, ByRef ReplaceWith As String) On Error Resume Next
Dim varSplit As Variant
varSplit = Split(arrBytes, SearchFor) arrBytes = Join$(varSplit, ReplaceWith)
Erase varSplit
End Sub

如果您知道可以将字节数组分配给VBA.String,则代码更容易理解,反之亦然。 BigReplace()函数是一个黑客,可以回避一些VBA的低效字符串处理,特别是分配:你会发现大文件会导致严重的内存和性能问题,如果你这样做的话。