我正在尝试加载一个简单的Xml文件(以UTF-8编码):
<?xml version="1.0" encoding="UTF-8"?>
<Test/>
并在vbscript中使用MSXML保存:
Set xmlDoc = CreateObject("MSXML2.DOMDocument.6.0")
xmlDoc.Load("C:\test.xml")
xmlDoc.Save "C:\test.xml"
问题是,MSXML以ANSI而不是UTF-8保存文件(尽管原始文件是以UTF-8编码的。)
MSDN docs for MSXML表示save()将以定义XML的任何编码写入文件:
字符编码基于XML声明中的encoding属性,例如。如果未指定编码属性,则默认设置为UTF-8。
但这显然不适用于我的机器。
MSXML如何以UTF-8保存?
答案 0 :(得分:3)
XML文件中没有任何非ANSI文本,因此无论是UTF-8还是ASCII编码都是相同的。在我的测试中,在向test.xml添加非ASCII文本之后,MSXML始终以UTF-8编码保存,并且如果有一个开始,也会写入BOM。
http://en.wikipedia.org/wiki/UTF-8
http://en.wikipedia.org/wiki/Byte_order_mark
答案 1 :(得分:3)
您在MSXML中使用另外两个类来将正确编码的XML写出到输出流。
这是我写入通用IStream的帮助方法:
class procedure TXMLHelper.WriteDocumentToStream(const Document60: IXMLDOMDocument2; const stream: IStream; Encoding: string = 'UTF-8');
var
writer: IMXWriter;
reader: IVBSAXXMLReader;
begin
{
From http://support.microsoft.com/kb/275883
INFO: XML Encoding and DOM Interface Methods
MSXML has native support for the following encodings:
UTF-8
UTF-16
UCS-2
UCS-4
ISO-10646-UCS-2
UNICODE-1-1-UTF-8
UNICODE-2-0-UTF-16
UNICODE-2-0-UTF-8
It also recognizes (internally using the WideCharToMultibyte API function for mappings) the following encodings:
US-ASCII
ISO-8859-1
ISO-8859-2
ISO-8859-3
ISO-8859-4
ISO-8859-5
ISO-8859-6
ISO-8859-7
ISO-8859-8
ISO-8859-9
WINDOWS-1250
WINDOWS-1251
WINDOWS-1252
WINDOWS-1253
WINDOWS-1254
WINDOWS-1255
WINDOWS-1256
WINDOWS-1257
WINDOWS-1258
}
if Document60 = nil then
raise Exception.Create('TXMLHelper.WriteDocument: Document60 cannot be nil');
if stream = nil then
raise Exception.Create('TXMLHelper.WriteDocument: stream cannot be nil');
// Set properties on the XML writer - including BOM, XML declaration and encoding
writer := CoMXXMLWriter60.Create;
writer.byteOrderMark := True; //Determines whether to write the Byte Order Mark (BOM). The byteOrderMark property has no effect for BSTR or DOM output. (Default True)
writer.omitXMLDeclaration := False; //Forces the IMXWriter to skip the XML declaration. Useful for creating document fragments. (Default False)
writer.encoding := Encoding; //Sets and gets encoding for the output. (Default "UTF-16")
writer.indent := True; //Sets whether to indent output. (Default False)
writer.standalone := True;
// Set the XML writer to the SAX content handler.
reader := CoSAXXMLReader60.Create;
reader.contentHandler := writer as IVBSAXContentHandler;
reader.dtdHandler := writer as IVBSAXDTDHandler;
reader.errorHandler := writer as IVBSAXErrorHandler;
reader.putProperty('http://xml.org/sax/properties/lexical-handler', writer);
reader.putProperty('http://xml.org/sax/properties/declaration-handler', writer);
writer.output := stream; //The resulting document will be written into the provided IStream
// Now pass the DOM through the SAX handler, and it will call the writer
reader.parse(Document60);
writer.flush;
end;
为了保存到文件,我使用 FileStream 调用 Stream 版本:
class procedure TXMLHelper.WriteDocumentToFile(const Document60: IXMLDOMDocument2; const filename: string; Encoding: string='UTF-8');
var
fs: TFileStream;
begin
fs := TFileStream.Create(filename, fmCreate or fmShareDenyWrite);
try
TXMLHelper.WriteDocumentToStream(Document60, fs, Encoding);
finally
fs.Free;
end;
end;
您可以将功能转换为您喜欢的任何语言。这些是德尔福。
答案 2 :(得分:1)
执行load
msxml时,不会将编码从处理指令复制到创建的文档中。所以它不包含任何编码,似乎msxml选择它喜欢的东西。在我的环境中,我不喜欢UTF-16。
解决方案是提供处理指令并在那里指定编码。如果您知道该文档没有处理说明,则代码很简单:
Set pi = xmlDoc.createProcessingInstruction("xml", _
"version=""1.0"" encoding=""windows-1250""")
If xmlDoc.childNodes.Length > 0 Then
Call xmlDoc.insertBefore(pi, xmlDoc.childNodes.Item(0))
End If
如果可能,文档包含其他处理指令,则必须先将其删除(因此下面的代码必须在上面的代码之前)。我不知道如何使用selectNode
来完成它,所以我只是迭代了所有根节点:
For ich=xmlDoc.childNodes.Length-1 to 0 step -1
Set ch = xmlDoc.childNodes.Item(ich)
If ch.NodeTypeString = "processinginstruction" and ch.NodeName = "xml" Then
xmlDoc.removeChild(ch)
End If
Next ich
很抱歉,如果代码没有直接执行,因为我修改了工作版本,这是用自定义编写的,而不是vbscript。