如何识别文件的确切文件类型?为了更好地理解,我将提供更多细节:
例如,如果我有一个名为“example.exe”的文件,那么我可以轻松地识别它是一个Windows可执行文件(通过查看扩展名.exe)。但是,如果我删除扩展名(.exe),那么看到我无法识别文件的类型。
那么我现在如何识别文件类型?
(请使用c / c ++,java,python或php(用于网络上传)建议您的答案)
由于
答案 0 :(得分:4)
没有“确切的文件类型”这样的东西。二进制数据是二进制数据。
如果您在类似POSIX的系统上运行,则可以使用file
命令猜测文件类型。我不认为这会给你一个MIME类型。
如果您的服务器正在运行Apache,那么您可以使用mod_mime_magic进行猜测。
如果您使用的是PHP,则可以安装fileinfo扩展程序。
答案 1 :(得分:1)
您需要知道要处理的每种文件类型的规范。
使用此规范,您可以创建一种方法来检查给定文件是否属于特定类型。
示例:
isExe(File)
isJpg(File)
答案 2 :(得分:0)
如果要查找文件扩展名,请尝试使用以下简单代码:
$ext = pathinfo($filename, PATHINFO_EXTENSION);
答案 3 :(得分:0)
对于Python:Python魔术库提供了 你需要的功能。
您可以使用
安装库pip install python-magic
并按如下方式使用:
>>> import magic
>>> magic.from_file('sampleone.jpg')
'JPEG image data, JFIF standard 1.01'
>>> magic.from_file('sampletwo.png')
'PNG image data, 600 x 1000, 8-bit colormap, non-interlaced'
答案 4 :(得分:0)
We cannot recognize type of file just from the extension. One can easily change extension of file from .text to .exe, which doesn't means that file is valid executable.
Lets assume we are on windows platform:
Portable-Executable [PE] is native Win32 file format. Every executable uses PE file format except VxDs and 16-bit dll's. 32-bit dll's, exe's,COM files,OCX control,CPL files,.NET executables, NT's kernal mode drivers are all PE format. Now Moving further PE format have its predefined structure it consist of different headers, section headers, section data etc. which contains information about address,size and executable code.
Headers contains some signature fileds:
e.g executables will always have MZ(0x5A4D) value in DOS header and PE(0x4550) value in PE header.
From above values we can distinguish as executables and non-executables.
Now moving towards non-executable:
Consider .jpg file : we use different tools to generate .jpg file. While creating a .jpg file this tools adds signature(something like 0xd8ff) in header file and binary data about image in data section. while opening .jpg file software reads signature in header field and if valid signature found it draws image based on binary data in section.
Similarly, .pdf,.mp3,... files will have unique signatures.
.text files will not have any signature. Data will be available from first offset of text file.
The header information can be viewed by following way:
$numbers = [
1234,// 1234
-1234,// -1234
'12,345.67890',// 12,345.67890
'-12,345,678.901234',// -12,345,678.901234
'12345.000000',// 12345
'-12345.000000',// -12345
'12,345.000000',// 12,345
'-12,345.000000000',// -12,345
];
foreach ($numbers as $number) {
var_dump(removeZeroDigitsFromDecimal($number));
}
echo '<hr>'."\n\n\n";
$numbers = [
1234,// 12324
-1234,// -1234
'12.345,67890',// 12.345,67890
'-12.345.678,901234',// -12.345.678,901234
'12345,000000',// 12345
'-12345,000000',// -12345
'12.345,000000',// 12.345
'-12.345,000000000',// -12.345
'-12.345,000000,000',// -12.345,000000 STRANGE!! but also work.
];
foreach ($numbers as $number) {
var_dump(removeZeroDigitsFromDecimal($number, ','));
}
Once file view is mapped header information can be retrived using below structures defined in winnt.h
CreateFile(...)//ReadMode
CreateFileMapping(...)
MapViewOfFile(...)
Signature should be matched against e_magic field of IMAGE_DOS_HEADER and if it is MZ(0x5A4D) then again match with Signature field of IMAGE_NT_HEADER.