XML Outputter添加了额外的非ascii字符

时间:2016-04-04 10:06:38

标签: c# azure-data-lake u-sql

我正在使用以下XML输出器来基于CSV数据编写xml文件。

public override void Output(IRow input, IUnstructuredWriter output)
    {
        IColumn badColumn = input.Schema.FirstOrDefault(col => col.Type != typeof(string));
        if (badColumn != null)
        {
            throw new ArgumentException(string.Format("Column '{0}' must be of type 'string', not '{1}'", badColumn.Name, badColumn.Type.Name));
        }

        using (var writer = XmlWriter.Create(output.BaseStream, this.fragmentSettings))
        {
            writer.WriteStartElement(this.rowPath);
            foreach (IColumn col in input.Schema)
            {
                var value = input.Get<string>(col.Name);
                if (value != null)
                {
                    // Skip null values in order to distinguish them from empty strings
                    writer.WriteElementString(this.columnPaths[col.Name] ?? col.Name, value);
                }
            }
        }
    }

它工作得非常好,作业完全没有任何错误但是,在预览和下载文件时还有另一个额外的字符,导致读取的xml文件失败。我尝试使用片段级别和Auto作为一致性级别。

我获得的样本输出是

enter image description here

并且2个标签之间的额外字符在读取文件时导致问题。

1 个答案:

答案 0 :(得分:0)

我通过明确提供编码设置以及使用以下代码结束标记来解决问题

private XmlWriterSettings fragmentSettings = new XmlWriterSettings
    {
        ConformanceLevel = ConformanceLevel.Auto,
        Encoding = Encoding.UTF8
    };

 public override void Output(IRow input, IUnstructuredWriter output)
    {
        IColumn badColumn = input.Schema.FirstOrDefault(col => col.Type != typeof(string));
        if (badColumn != null)
        {
            throw new ArgumentException(string.Format("Column '{0}' must be of type 'string', not '{1}'", badColumn.Name, badColumn.Type.Name));
        }
        using (var writer = XmlWriter.Create(output.BaseStream, this.fragmentSettings))
        {
            writer.WriteStartElement(this.rowPath);
            foreach (IColumn col in input.Schema)
            {
                var value = input.Get<string>(col.Name);
                if (value != null)
                {
                    // Skip null values in order to distinguish them from empty strings
                    writer.WriteElementString(this.columnPaths[col.Name] ?? col.Name, value);
                }
            }
            writer.WriteEndElement(); //explicit closing tag for stream
        }
    }

这将输出格式良好的XML,可以使用任何xml读取器轻松读取。