通过HSSF.EventUserModel读取带有受保护书籍和表格的XLS

时间:2013-04-03 18:01:47

标签: c# java excel apache-poi npoi

结束目标:有效(一次性)阅读一个巨大的(30,000多行)受保护的CellRecords上的所有Worksheet

问题: 使用HSSF.EventUserModel,如何阅读包含工作簿和工作表保护的XLS文件的所有Record(包括CellRecords)?

创建输入电子表格(在Excel 2010中):

  1. 创建新的空白工作簿。
  2. 将A1的值设置为数字:50
  3. 将A2的值设置为字符串:50
  4. 将A3的值设置为公式:= 25 * 2
  5. 评论(功能区) - >保护表 - >密码:pass1
  6. 评论(功能区) - >保护工作簿 - >密码:pass1
  7. 文件(功能区) - >另存为... - >保存类型:Excel 97-2003工作簿
  8. 迄今取得的进展:

    • XLS文件在Excel中没有密码打开。因此,您不需要密码才能在POI中打开它。
    • XLS文件已成功打开new HSSFWorkbook(Stream fs)。但是,对于我的实际电子表格,我需要EventUserModel的效率。
    • 设置NPOI.HSSF.Record.Crypto.Biff8EncryptionKey.CurrentUserPassword = "pass1";无效。
    • ProcessRecord( )函数捕获PasswordRecord,但我找不到任何有关如何正确处理它的文档。
    • 也许,EncryptionInfoDecryptor类可能有用。

    注意:
    我正在使用NPOI。但是,我可以将任何Java示例翻译为C#。

    代码:
    我使用以下代码捕获Record个事件。我的Book1-unprotected.xls(没有保护)会显示所有Record个事件(包括单元格值)。我的Book1-protected.xls显示一些记录并引发异常。

    我只是在调试器中查看processedEvents

    using System;
    using System.Collections.Generic;
    using System.IO;
    
    using NPOI.HSSF.Record;
    using NPOI.HSSF.Model;
    using NPOI.HSSF.UserModel;
    using NPOI.HSSF.EventUserModel;
    using NPOI.POIFS;
    using NPOI.POIFS.FileSystem;
    
    namespace NPOI_small {
        class myListener : IHSSFListener {
            List<Record> processedRecords;
    
            private Stream fs;
    
            public myListener(Stream fs) {
                processedRecords = new List<Record>();
                this.fs = fs;
    
                HSSFEventFactory factory = new HSSFEventFactory();
                HSSFRequest request = new HSSFRequest();
    
                MissingRecordAwareHSSFListener mraListener;
                FormatTrackingHSSFListener fmtListener;
                EventWorkbookBuilder.SheetRecordCollectingListener recListener;
                mraListener = new MissingRecordAwareHSSFListener(this);
                fmtListener = new FormatTrackingHSSFListener(mraListener);
                recListener = new EventWorkbookBuilder.SheetRecordCollectingListener(fmtListener);
                request.AddListenerForAllRecords(recListener);
    
                POIFSFileSystem poifs = new POIFSFileSystem(this.fs);
    
                factory.ProcessWorkbookEvents(request, poifs);
            }
    
            public void ProcessRecord(Record record) {
                processedRecords.Add(record);
            }
        }
        class Program {
            static void Main(string[] args) {
                Stream fs = File.OpenRead(@"c:\users\me\desktop\xx\Book1-protected.xls");
    
                myListener testListener = new myListener(fs); // Use EventModel 
                //HSSFWorkbook book = new HSSFWorkbook(fs); // Use UserModel
    
                Console.Read();
            }
        }
    }
    

    更新(对于Juan Mellado) 以下是例外。我现在最好的猜测(在Victor Petrykin的回答中)是HSSFEventFactory使用RecordInputStream无法原生解密受保护的记录。收到例外后,processedRecords包含22条记录,其中包括以下可能重要的记录:

    • processedRecords [5]是WriteAccessRecord.name
    • 的值为乱码(可能已加密)
    • processedRecords [22]是RefreshAllRecord,是列表中的最后一个Record

    例外:

    NPOI.Util.RecordFormatException was unhandled
      HResult=-2146233088
      Message=Unable to construct record instance
      Source=NPOI
      StackTrace:
           at NPOI.HSSF.Record.RecordFactory.ReflectionConstructorRecordCreator.Create(RecordInputStream in1)
           at NPOI.HSSF.Record.RecordFactory.CreateSingleRecord(RecordInputStream in1)
           at NPOI.HSSF.Record.RecordFactory.CreateRecord(RecordInputStream in1)
           at NPOI.HSSF.EventUserModel.HSSFRecordStream.GetNextRecord()
           at NPOI.HSSF.EventUserModel.HSSFRecordStream.NextRecord()
           at NPOI.HSSF.EventUserModel.HSSFEventFactory.GenericProcessEvents(HSSFRequest req, RecordInputStream in1)
           at NPOI.HSSF.EventUserModel.HSSFEventFactory.ProcessEvents(HSSFRequest req, Stream in1)
           at NPOI.HSSF.EventUserModel.HSSFEventFactory.ProcessWorkbookEvents(HSSFRequest req, POIFSFileSystem fs)
           at NPOI_small.myListener..ctor(Stream fs) in c:\Users\me\Documents\Visual Studio 2012\Projects\myTest\NPOI_small\Program.cs:line 35
           at NPOI_small.Program.Main(String[] args) in c:\Users\me\Documents\Visual Studio 2012\Projects\myTest\NPOI_small\Program.cs:line 80
           at System.AppDomain._nExecuteAssembly(RuntimeAssembly assembly, String[] args)
           at Microsoft.VisualStudio.HostingProcess.HostProc.RunUsersAssembly()
           at System.Threading.ExecutionContext.RunInternal(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
           at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state, Boolean preserveSyncCtx)
           at System.Threading.ExecutionContext.Run(ExecutionContext executionContext, ContextCallback callback, Object state)
           at System.Threading.ThreadHelper.ThreadStart()
      InnerException: NPOI.Util.RecordFormatException
           HResult=-2146233088
           Message=Expected to find a ContinueRecord in order to read remaining 137 of 144 chars
           Source=NPOI
           StackTrace:
                at NPOI.HSSF.Record.RecordInputStream.ReadStringCommon(Int32 requestedLength, Boolean pIsCompressedEncoding)
                at NPOI.HSSF.Record.RecordInputStream.ReadUnicodeLEString(Int32 requestedLength)
                at NPOI.HSSF.Record.FontRecord..ctor(RecordInputStream in1)
    

1 个答案:

答案 0 :(得分:3)

我认为这是 NPOI 库代码中的错误。据我所知,他们为HSSFEventFactory使用了错误的流类型:它使用RecordInputStream代替RecordFactoryInputStream解密功能,就像在原来的 POI 库中一样UserModel(这就是HSSFWorkbook正在运作的原因)

此代码也有效,但它不是事件逻辑:

POIFSFileSystem poifs = new POIFSFileSystem(fs);
Entry document = poifs.Root.GetEntry("Workbook");
DocumentInputStream docStream = new DocumentInputStream((DocumentEntry)document);
//RecordFactory factory = new RecordFactory();
//List<Record> records = RecordFactory.CreateRecords(docStream);
RecordFactoryInputStream recFacStream = new RecordFactoryInputStream(docStream, true);
Record currRecord;
while ((currRecord = recFacStream.NextRecord()) != null) 
   ProcessRecord(currRecord);