如何检查是否可以从文件流构造IWorkbook对象?

时间:2016-03-05 19:45:24

标签: c# .net excel npoi

我使用NPOI库来读取xlsx和xls文件。

我有这段代码:

IWorkbook workBook = null;
string fileExtension = Path.GetExtension(path);
using (var fs = new FileStream(path, FileMode.Open, FileAccess.Read))
{
    if (fileExtension == ".xls")
        workBook = new HSSFWorkbook(fs);
    else if (fileExtension == ".xlsx")
        workBook = new XSSFWorkbook(fs);
}

这是完美的工作。 但path excel文件的问题并不总是在他的名字中有扩展名(.xls或.xlsx)。

因此,我需要检查fsHSSFWorkbook()

XSSFWorkbook()套件是否class Vector(object): vec = [] def __init__(self, l): self.vec = l def dim(): return len(self.vec) def __getitem__(self, i): return self.vec[i - 1] def __setitem__(self, i, x): self.vec[i - 1] = x def __str__(self): s = 'Vector: [' for i in range(0, len(self.vec)): s = s + str(self.vec[i]) if i < len(self.vec) - 1: s = s + ', ' s = s + ']' return s def __add__(self, other): assert(type(other) == Vector) v = self.vec for i in range(0, len(v)): v[i]=v[i] + other[i+1] x = Vector(v) return x def __mul__(self, other): if type(other) == type(self): v = self.vec for i in range(0, len(v)): v[i]=v[i]*other[i+1] x = Vector(v) return sum(x) elif type(other) == type(1) or type(other) == type(1.0): v = self.vec for i in range(0, len(v)): v[i] = v[i] *other x = Vector(v) return x def __rmul__(self, other): return self.__mul__(other)

任何想法如何在没有文件扩展名的情况下检查它?

2 个答案:

答案 0 :(得分:2)

            IWorkbook workBook = null;
            string fileExtension = Path.GetExtension(path);

            using (var fs = new FileStream(path, FileMode.Open, FileAccess.Read))
            {
                workBook = WorkbookFactory.Create(fs);
            }

WorkbookFactory.Create()方法根据从xls或xlsx文件构建的fileStreem参数构造IWorkbook。

答案 1 :(得分:0)

应用https://en.wikipedia.org/wiki/List_of_file_signatures的文件标题信息,我们可以使用以下内容:

public static class FormatRecognizer
{
    public static Boolean IsZipFile(Stream stream)
    {
        if (stream == null)
            throw new ArgumentNullException(paramName: nameof(stream));

        var zipHeader = new Byte[]
        {
            0x50, 0x4B, 0x03, 0x04
        };

        var streamBytes = GetBytesAndRestore(stream, zipHeader.Length);
        return streamBytes.SequenceEqual(zipHeader);
    }


    public static Boolean IsOffice2003File(Stream stream)
    {
        if (stream == null)
            throw new ArgumentNullException(paramName: nameof(stream));

        var officeHeader = new Byte[]
        {
            0xD0, 0xCF, 0x11, 0xE0, 0xA1, 0xB1, 0x1A, 0xE1,
        };

        var streamBytes = GetBytesAndRestore(stream, officeHeader.Length);
        return streamBytes.SequenceEqual(officeHeader);
    }


    private static IEnumerable<Byte> GetBytesAndRestore(Stream stream, Int32 bytesCount)
    {
        if (stream == null)
            throw new ArgumentNullException(paramName: nameof(stream));

        var position = stream.Position;
        try
        {
            using (var reader = new BinaryReader(stream, Encoding.Default, leaveOpen: true))
            {
                return reader.ReadBytes(bytesCount);
            }
        }
        finally
        {
            stream.Position = position;
        }
    }
}

...

private static void PrintFormatInfo(String path)
{
    Console.WriteLine("File at '{0}'", path);
    using (var stream = File.Open(path, FileMode.Open))
    {
        PrintFormatInfo(stream);
    }
}

private static void PrintFormatInfo(Stream stream)
{
    Console.WriteLine("Is office 2003 = {0}", FormatRecognizer.IsOffice2003File(stream));
    Console.WriteLine("Is zip file (possibly xlsx) = {0}", FormatRecognizer.IsZipFile(stream));
}

...

PrintFormatInfo("1.txt");
PrintFormatInfo("1.xls");
PrintFormatInfo("1.xlsx");

这不是绝对可靠的,因为IsZipFile对于简单的zip存档会返回true,而IsOffice2003File也会对doc,ppt等成功。

但这是我能想到的最简单的解决方案。任何更正确的东西都需要更深入的文件格式知识,这可能是您需要的,也可能不是。