我目前正在使用SharpZip api来处理我的zip文件条目。它适用于压缩和解压缩。但是,我无法确定文件是否为zip文件。我需要知道是否有办法检测文件流是否可以解压缩。最初我用过
FileStream lFileStreamIn = File.OpenRead(mSourceFile);
lZipFile = new ZipFile(lFileStreamIn);
ZipInputStream lZipStreamTester = new ZipInputStream(lFileStreamIn, mBufferSize);// not working
lZipStreamTester.Read(lBuffer, 0, 0);
if (lZipStreamTester.CanDecompressEntry)
{
每次LZipStreamTester变为null,if语句失败。我尝试使用/不使用缓冲区。任何人都可以提供任何有关原因的见解吗?我知道我可以检查文件扩展名。我需要的东西比那更明确。我也知道zip有一个神奇的#(PK的东西),但它不能保证它永远存在,因为它不是格式的要求。
另外我读到.net 4.5有本机zip支持所以我的项目可能会迁移到而不是sharpzip,但我仍然需要在这里没有看到类似于CanDecompressEntry的方法/参数:http://msdn.microsoft.com/en-us/library/3z72378a%28v=vs.110%29
我的最后一招是使用try catch并尝试解压缩文件。
答案 0 :(得分:11)
查看https://stackoverflow.com/a/16587134/206730参考
检查以下链接:
icsharpcode-sharpziplib-validate-zip-file
How-to-check-if-a-file-is-compressed-in-c#
ZIP文件始终以0x04034b50(4字节)开头 查看更多:http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers
样本用法:
bool isPKZip = IOHelper.CheckSignature(pkg, 4, IOHelper.SignatureZip);
Assert.IsTrue(isPKZip, "Not ZIP the package : " + pkg);
// http://blog.somecreativity.com/2008/04/08/how-to-check-if-a-file-is-compressed-in-c/
public static partial class IOHelper
{
public const string SignatureGzip = "1F-8B-08";
public const string SignatureZip = "50-4B-03-04";
public static bool CheckSignature(string filepath, int signatureSize, string expectedSignature)
{
if (String.IsNullOrEmpty(filepath)) throw new ArgumentException("Must specify a filepath");
if (String.IsNullOrEmpty(expectedSignature)) throw new ArgumentException("Must specify a value for the expected file signature");
using (FileStream fs = new FileStream(filepath, FileMode.Open, FileAccess.Read, FileShare.ReadWrite))
{
if (fs.Length < signatureSize)
return false;
byte[] signature = new byte[signatureSize];
int bytesRequired = signatureSize;
int index = 0;
while (bytesRequired > 0)
{
int bytesRead = fs.Read(signature, index, bytesRequired);
bytesRequired -= bytesRead;
index += bytesRead;
}
string actualSignature = BitConverter.ToString(signature);
if (actualSignature == expectedSignature) return true;
return false;
}
}
}
答案 1 :(得分:10)
这是需要处理未压缩,PKZIP压缩(sharpziplib)或GZip压缩(内置.net)数据的组件的基类。也许比你需要的多一点,但应该让你去。这是使用@ PhonicUK的建议来解析数据流的标头的示例。您在小工厂mathod中看到的派生类处理了PKZip和GZip解压缩的细节。
abstract class Expander
{
private const int ZIP_LEAD_BYTES = 0x04034b50;
private const ushort GZIP_LEAD_BYTES = 0x8b1f;
public abstract MemoryStream Expand(Stream stream);
internal static bool IsPkZipCompressedData(byte[] data)
{
Debug.Assert(data != null && data.Length >= 4);
// if the first 4 bytes of the array are the ZIP signature then it is compressed data
return (BitConverter.ToInt32(data, 0) == ZIP_LEAD_BYTES);
}
internal static bool IsGZipCompressedData(byte[] data)
{
Debug.Assert(data != null && data.Length >= 2);
// if the first 2 bytes of the array are theG ZIP signature then it is compressed data;
return (BitConverter.ToUInt16(data, 0) == GZIP_LEAD_BYTES);
}
public static bool IsCompressedData(byte[] data)
{
return IsPkZipCompressedData(data) || IsGZipCompressedData(data);
}
public static Expander GetExpander(Stream stream)
{
Debug.Assert(stream != null);
Debug.Assert(stream.CanSeek);
stream.Seek(0, 0);
try
{
byte[] bytes = new byte[4];
stream.Read(bytes, 0, 4);
if (IsGZipCompressedData(bytes))
return new GZipExpander();
if (IsPkZipCompressedData(bytes))
return new ZipExpander();
return new NullExpander();
}
finally
{
stream.Seek(0, 0); // set the stream back to the begining
}
}
}
答案 2 :(得分:8)
你可以:
ZIP文件始终以0x04034b50开头,作为前4个字节(http://en.wikipedia.org/wiki/Zip_(file_format)#File_headers)
答案 3 :(得分:2)
如果您正在使用Web编程,则可以检查文件内容类型:application / zip
答案 4 :(得分:1)
感谢dkackman和Kiquenet上面的答案。 为完整起见,以下代码使用签名来标识压缩(zip)文件。 然后,您会增加复杂性,即较新的MS Office文件格式也将返回与此签名查找(您的.docx和.xlsx文件等)匹配。 正如其他地方所述,这些确实是压缩存档,您可以使用.zip扩展名重命名文件,并查看内部的XML。
在下面的代码中,首先使用上面使用的签名检查ZIP(压缩),然后我们对MS Office包进行后续检查。 请注意,要使用System.IO.Packaging.Package,您需要一个项目引用&#34; WindowsBase&#34; (这是一个.NET程序集引用)。
private const string SignatureZip = "50-4B-03-04";
private const string SignatureGzip = "1F-8B-08";
public static bool IsZip(this Stream stream)
{
if (stream.Position > 0)
{
stream.Seek(0, SeekOrigin.Begin);
}
bool isZip = CheckSignature(stream, 4, SignatureZip);
bool isGzip = CheckSignature(stream, 3, SignatureGzip);
bool isSomeKindOfZip = isZip || isGzip;
if (isSomeKindOfZip && stream.IsPackage()) //Signature matches ZIP, but it's package format (docx etc).
{
return false;
}
return isSomeKindOfZip;
}
/// <summary>
/// MS .docx, .xslx and other extensions are (correctly) identified as zip files using signature lookup.
/// This tests if System.IO.Packaging is able to open, and if package has parts, this is not a zip file.
/// </summary>
/// <param name="stream"></param>
/// <returns></returns>
private static bool IsPackage(this Stream stream)
{
Package package = Package.Open(stream, FileMode.Open, FileAccess.Read);
return package.GetParts().Any();
}
答案 5 :(得分:0)
我使用https://en.wikipedia.org/wiki/List_of_file_signatures,只是为zip文件添加了一个额外的字节,以区分zip文件和Word文档(它们共享前四个字节)。
这是我的代码:
public class ZipFileUtilities
{
private static readonly byte[] ZipBytes1 = { 0x50, 0x4b, 0x03, 0x04, 0x0a };
private static readonly byte[] GzipBytes = { 0x1f, 0x8b };
private static readonly byte[] TarBytes = { 0x1f, 0x9d };
private static readonly byte[] LzhBytes = { 0x1f, 0xa0 };
private static readonly byte[] Bzip2Bytes = { 0x42, 0x5a, 0x68 };
private static readonly byte[] LzipBytes = { 0x4c, 0x5a, 0x49, 0x50 };
private static readonly byte[] ZipBytes2 = { 0x50, 0x4b, 0x05, 0x06 };
private static readonly byte[] ZipBytes3 = { 0x50, 0x4b, 0x07, 0x08 };
public static byte[] GetFirstBytes(string filepath, int length)
{
using (var sr = new StreamReader(filepath))
{
sr.BaseStream.Seek(0, 0);
var bytes = new byte[length];
sr.BaseStream.Read(bytes, 0, length);
return bytes;
}
}
public static bool IsZipFile(string filepath)
{
return IsCompressedData(GetFirstBytes(filepath, 5));
}
public static bool IsCompressedData(byte[] data)
{
foreach (var headerBytes in new[] { ZipBytes1, ZipBytes2, ZipBytes3, GzipBytes, TarBytes, LzhBytes, Bzip2Bytes, LzipBytes })
{
if (HeaderBytesMatch(headerBytes, data))
return true;
}
return false;
}
private static bool HeaderBytesMatch(byte[] headerBytes, byte[] dataBytes)
{
if (dataBytes.Length < headerBytes.Length)
throw new ArgumentOutOfRangeException(nameof(dataBytes),
$"Passed databytes length ({dataBytes.Length}) is shorter than the headerbytes ({headerBytes.Length})");
for (var i = 0; i < headerBytes.Length; i++)
{
if (headerBytes[i] == dataBytes[i]) continue;
return false;
}
return true;
}
}
也许有更好的方法来编写此代码,尤其是字节比较,但是由于它是一个可变长度的字节比较(取决于所检查的签名),因此我认为至少此代码是可读的-至少对我来说是这样。