由于XmlConvert.IsXmlChar无法检测到无效字符,因此无法使用OpenXML库写入Excel文件

时间:2019-05-22 19:15:21

标签: c# .net excel openxml openxml-sdk

我正在尝试使用OpenXML库编写Excel文件。

我尝试了以下操作:

 try
 {
   excel.AddRow(toWrite.JobTitle.StripOutInvalidXmlCharacters(),
   toWrite.LegalEntity.StripOutInvalidXmlCharacters(), toWrite.WorkingLocation.StripOutInvalidXmlCharacters(), toWrite.WorkingCountry.StripOutInvalidXmlCharacters(), toWrite.JobDescription.StripOutInvalidXmlCharacters(), toWrite.Qualifications.StripOutInvalidXmlCharacters(), toWrite.StatusOfJob.ToString().StripOutInvalidXmlCharacters(), toWrite.PositionOpenDate.StripOutInvalidXmlCharacters(), toWrite.InternalOnly.StripOutInvalidXmlCharacters(), toWrite.StatusOfJob.In(JobStatus.CREATED, JobStatus.INTERVIEW, JobStatus.OFFER, JobStatus.SOURCING) ? "Active" : "Inactive", toWrite.Language?.StripOutInvalidXmlCharacters() ?? "", postingStatus.StripOutInvalidXmlCharacters());
}
catch (ArgumentException e)
{
   Logger.Log($"Exception while writing to Excel file. Details:{Environment.NewLine}{e.ToString()}");
}

这里是AddRow以及类中的相关属性和方法:

    public SheetData Data { get; private set; }

    public void AddRow(params string[] values)
    {
        if (disposed)
        {
            throw new ObjectDisposedException("FastExcelUtility");
        }

        Row r = new Row();

        foreach (string value in values)
        {
            r.Append(ConstructCell(value));
        }

        Data.AppendChild(r);
    }

    private Cell ConstructCell(string value, CellValues type)
    {
        return new Cell()
        {
            CellValue = new CellValue(value),
            DataType = new EnumValue<CellValues>(type)
        };
    }

但是,对于某些行,由于字符无效-特别是0x0C,我得到了ArgumentException。这是完整的堆栈跟踪:

Exception terminated the running report at 5/21/2019 7:55:42 PM. Details:
System.AggregateException: One or more errors occurred. ---> System.ArgumentException: '', hexadecimal value 0x0C, is an invalid character.
   at System.Xml.XmlUtf8RawTextWriter.InvalidXmlChar(Int32 ch, Byte* pDst, Boolean entitize)
   at System.Xml.XmlUtf8RawTextWriter.WriteElementTextBlock(Char* pSrc, Char* pSrcEnd)
   at System.Xml.XmlUtf8RawTextWriter.WriteString(String text)
   at System.Xml.XmlWellFormedWriter.WriteString(String text)
   at DocumentFormat.OpenXml.OpenXmlLeafTextElement.WriteContentTo(XmlWriter w)
   at DocumentFormat.OpenXml.OpenXmlElement.WriteTo(XmlWriter xmlWriter)
   at DocumentFormat.OpenXml.OpenXmlCompositeElement.WriteContentTo(XmlWriter w)
   at DocumentFormat.OpenXml.OpenXmlElement.WriteTo(XmlWriter xmlWriter)
   at DocumentFormat.OpenXml.OpenXmlCompositeElement.WriteContentTo(XmlWriter w)
   at DocumentFormat.OpenXml.OpenXmlElement.WriteTo(XmlWriter xmlWriter)
   at DocumentFormat.OpenXml.OpenXmlCompositeElement.WriteContentTo(XmlWriter w)
   at DocumentFormat.OpenXml.OpenXmlElement.WriteTo(XmlWriter xmlWriter)
   at DocumentFormat.OpenXml.OpenXmlCompositeElement.WriteContentTo(XmlWriter w)
   at DocumentFormat.OpenXml.OpenXmlPartRootElement.WriteTo(XmlWriter xmlWriter)
   at DocumentFormat.OpenXml.OpenXmlPartRootElement.Save(Stream stream)
   at DocumentFormat.OpenXml.OpenXmlPartRootElement.SaveToPart(OpenXmlPart openXmlPart)
   at DocumentFormat.OpenXml.OpenXmlPartRootElement.Save()
   at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.TrySavePartContent(OpenXmlPart part)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.SavePartContents(Boolean save)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.Dispose(Boolean disposing)
   at Reports.General_Classes.FastExcelUtility.Dispose(Boolean disposing)
   at Reports.General_Classes.FastExcelUtility.Dispose()
   at Reports.Reports.QualificationReport.<GetCSV>d__1.MoveNext()
   --- End of inner exception stack trace ---
   at System.Threading.Tasks.Task.ThrowIfExceptional(Boolean includeTaskCanceledExceptions)
   at System.Threading.Tasks.Task.Wait(Int32 millisecondsTimeout, CancellationToken cancellationToken)
   at Report_Runner.Program.Main(String[] args)
---> (Inner Exception #0) System.ArgumentException: '', hexadecimal value 0x0C, is an invalid character.
   at System.Xml.XmlUtf8RawTextWriter.InvalidXmlChar(Int32 ch, Byte* pDst, Boolean entitize)
   at System.Xml.XmlUtf8RawTextWriter.WriteElementTextBlock(Char* pSrc, Char* pSrcEnd)
   at System.Xml.XmlUtf8RawTextWriter.WriteString(String text)
   at System.Xml.XmlWellFormedWriter.WriteString(String text)
   at DocumentFormat.OpenXml.OpenXmlLeafTextElement.WriteContentTo(XmlWriter w)
   at DocumentFormat.OpenXml.OpenXmlElement.WriteTo(XmlWriter xmlWriter)
   at DocumentFormat.OpenXml.OpenXmlCompositeElement.WriteContentTo(XmlWriter w)
   at DocumentFormat.OpenXml.OpenXmlElement.WriteTo(XmlWriter xmlWriter)
   at DocumentFormat.OpenXml.OpenXmlCompositeElement.WriteContentTo(XmlWriter w)
   at DocumentFormat.OpenXml.OpenXmlElement.WriteTo(XmlWriter xmlWriter)
   at DocumentFormat.OpenXml.OpenXmlCompositeElement.WriteContentTo(XmlWriter w)
   at DocumentFormat.OpenXml.OpenXmlElement.WriteTo(XmlWriter xmlWriter)
   at DocumentFormat.OpenXml.OpenXmlCompositeElement.WriteContentTo(XmlWriter w)
   at DocumentFormat.OpenXml.OpenXmlPartRootElement.WriteTo(XmlWriter xmlWriter)
   at DocumentFormat.OpenXml.OpenXmlPartRootElement.Save(Stream stream)
   at DocumentFormat.OpenXml.OpenXmlPartRootElement.SaveToPart(OpenXmlPart openXmlPart)
   at DocumentFormat.OpenXml.OpenXmlPartRootElement.Save()
   at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.TrySavePartContent(OpenXmlPart part)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.SavePartContents(Boolean save)
   at DocumentFormat.OpenXml.Packaging.OpenXmlPackage.Dispose(Boolean disposing)
   at Reports.General_Classes.FastExcelUtility.Dispose(Boolean disposing)
   at Reports.General_Classes.FastExcelUtility.Dispose()
   at Reports.Reports.QualificationReport.<GetCSV>d__1.MoveNext()<---

我尝试通过在字符串上使用以下扩展方法来解决此问题:

   public static string StripOutInvalidXmlCharacters(this string str)
    {
        var sb = new StringBuilder();

        for (int i = 0; i < str.Length; i++)
        {
            if (XmlConvert.IsXmlChar(str[i]))
            {
                sb.Append(str[i]);
            }
        }

        return sb.ToString();
    }

但是它不能完全解决问题。

在实施此方法之前,无效字符为0x03;现在是0x0C,文件最终比崩溃前更大。该方法还通过了以下单元测试:

    [TestMethod]
    public void StripOutInvalidChars()
    {
        var str = new string(new[] { 'a', 'b', (char)0x03, 'c' });

        // Make sure that I didn't screw up setting up the test
        Assert.AreNotEqual(str, "abc");

        // str is invalid - make sure that it strips out the invalid char
        Assert.AreEqual(str.StripOutInvalidXmlCharacters(), "abc");

        // Should have no effect whatever on a valid string
        str = new string(new[] { 'a', 'b', 'c' });

        Assert.AreEqual(str.StripOutInvalidXmlCharacters(), "abc");
    }

话虽这么说,但我认为实际上是在消除一些无效字符,但显然并没有消除所有无效字符。

try - catch块也没有帮助。发生此异常时,Excel会说该文件已损坏,无法打开或修复。当程序无例外运行时,我可以很好地打开报告。

愚蠢的解决方案只是手动检查0x0C并将其删除,但我担心这种解决方案的效果如何。特别是,我担心此方法可能缺少其他字符,在这种情况下,我的过程可能仍然不稳定。

0 个答案:

没有答案