我只是对包括浏览器在内的本地应用程序如何读取/解释mime类型感到好奇。用于读取mime类型的插件是否构建到每个应用程序中,或者在解释mime类型时应用程序引用的操作系统中是否有特殊的系统文件夹?
在定义MIME类型时,RFC使用字符图作为参考:
(1)除了字符集之外的文本消息体 US-ASCII
虽然MDN听起来像是使用content-type
,但您会在HTML等内容中找到它。
像content-type=image/jpeg
或content-type=application/javascript
之类的东西是否使用UTF-8字符表来确定它们的字符集(字形),而其他东西用逻辑来确定那些字形字形应该被解释为什么? / p>
OR这是否意味着每个内容类型都有自己的特殊charchart(如utf-8 - &gt; js-8 ????),它将字符的字形转换和char字形的逻辑解释转换为二进制?< / p>
为什么它听起来像charcharts而内容类型都是MIME? 包含内容类型图表/解释逻辑的Mac和Linux系统的文件夹路径在哪里?
答案 0 :(得分:4)
那些大多位于/ usr / share / mime和/ usr / share / mime-info,linux和mac(几乎整个unix树)都没有跟随扩展,只是扩展名的内容只是为了便于用户。
注意:具体应用程序位于“/ usr / share / mimelnk”(感谢David C. Rankin)
(您也可以尝试在终端中locate mime
获取更多信息)
答案 1 :(得分:1)
在macOS上,您可以使用file --mime "/path/to/filename"
报告文件的mime类型。
file
的手册页(参见the docs)揭示了在mime类型查找之前发生的事情:
file tests each argument in an attempt to classify it. There are three
sets of tests, performed in this order: filesystem tests, magic tests,
and language tests. The first test that succeeds causes the file type to
be printed.
The filesystem tests are based on examining the return from a stat(2)
system call. The program checks to see if the file is empty, or if it's
some sort of special file. Any known file types appropriate to the sys-
tem you are running on (sockets, symbolic links, or named pipes (FIFOs)
on those systems that implement them) are intuited if they are defined in
the system header file <sys/stat.h>.
The magic tests are used to check for files with data in particular fixed
formats. The canonical example of this is a binary executable (compiled
program) a.out file, whose format is defined in <elf.h>, <a.out.h> and
possibly <exec.h> in the standard include directory. These files have a
``magic number'' stored in a particular place near the beginning of the
file that tells the UNIX operating system that the file is a binary exe-
cutable, and which of several types thereof. The concept of a ``magic''
has been applied by extension to data files. Any file with some invari-
ant identifier at a small fixed offset into the file can usually be
described in this way. The information identifying these files is read
from the compiled magic file /usr/share/file/magic.mgc, or the files in
the directory /usr/share/file/magic if the compiled file does not exist.
If a file does not match any of the entries in the magic file, it is
examined to see if it seems to be a text file. ASCII, ISO-8859-x, non-
ISO 8-bit extended-ASCII character sets (such as those used on Macintosh
and IBM PC systems), UTF-8-encoded Unicode, UTF-16-encoded Unicode, and
EBCDIC character sets can be distinguished by the different ranges and
sequences of bytes that constitute printable text in each set. If a file
passes any of these tests, its character set is reported. ASCII,
ISO-8859-x, UTF-8, and extended-ASCII files are identified as ``text''
because they will be mostly readable on nearly any terminal; UTF-16 and
EBCDIC are only ``character data'' because, while they contain text, it
is text that will require translation before it can be read. In addi-
tion, file will attempt to determine other characteristics of text-type
files. If the lines of a file are terminated by CR, CRLF, or NEL,
instead of the Unix-standard LF, this will be reported. Files that con-
tain embedded escape sequences or overstriking will also be identified.
Once file has determined the character set used in a text-type file, it
will attempt to determine in what language the file is written. The lan-
guage tests look for particular strings (cf. <names.h>) that can appear
anywhere in the first few blocks of a file. For example, the keyword .br
indicates that the file is most likely a troff(1) input file, just as the
keyword struct indicates a C program. These tests are less reliable than
the previous two groups, so they are performed last. The language test
routines also test for some miscellany (such as tar(1) archives).
Any file that cannot be identified as having been written in any of the
character sets listed above is simply said to be ``data''.