Question

如何从Linux上的TrueType或嵌入式OpenType字体中提取支持的Unicode字符列表？

是否有工具或库可用于处理.ttf或.eot文件并构建字体提供的代码点列表（如U + 0123，U + 1234等）？

Answer 1

以下是使用FontTools模块的方法（您可以使用类似pip install fonttools的方式安装）：

#!/usr/bin/env python
from itertools import chain
import sys

from fontTools.ttLib import TTFont
from fontTools.unicode import Unicode

ttf = TTFont(sys.argv[1], 0, verbose=0, allowVID=0,
                ignoreDecompileErrors=True,
                fontNumber=-1)

chars = chain.from_iterable([y + (Unicode[y[0]],) for y in x.cmap.items()] for x in ttf["cmap"].tables)
print(list(chars))

# Use this for just checking if the font contains the codepoint given as
# second argument:
#char = int(sys.argv[2], 0)
#print(Unicode[char])
#print(char in (x[0] for x in chars))

ttf.close()

脚本将字体路径作为参数：

python checkfont.py /path/to/font.ttf

Answer 2

Linux程序xfd可以做到这一点。它在我的发行版中提供为xorg-xfd＆＃39;。要查看字体的所有字符，可以在终端中运行：

xfd -fa "DejaVu Sans Mono"

Answer 3

fc-query my-font.ttf将根据fontconfig

为您提供支持的字形地图以及该字体适用于的所有语言环境
由于几乎所有现代Linux应用程序都基于fontconfig，因此比原始unicode列表更有用

这里讨论实际的输出格式 http://lists.freedesktop.org/archives/fontconfig/2013-September/004915.html

Answer 4

ttf / otf字体的字符代码点存储在CMAP表中。

您可以使用ttx生成CMAP表的XML表示形式。见here。

您可以运行命令ttx.exe -t cmap MyFont.ttf，它应该输出文件MyFont.ttx。在文本编辑器中打开它，它应该显示它在字体中找到的所有字符代码。

Answer 5

这是一个 ~~POSIX~~ [1]外壳程序脚本，它可以借助{{3}中提到的fc-match轻松而轻松地打印代码点和字符。 }}（它甚至可以处理8位十六进制Unicode）：

#!/bin/sh
for range in $(fc-match --format='%{charset}\n' "$1"); do
    for n in $(seq "0x${range%-*}" "0x${range#*-}"); do
        n_hex=$(printf "%04x" "$n")
        # using \U for 5-hex-digits
        printf "%-5s\U$n_hex\t" "$n_hex"
        count=$((count + 1))
        if [ $((count % 10)) = 0 ]; then
            printf "\n"
        fi
    done
done
printf "\n"

您可以传递字体名称或fc-match接受的任何内容：

$ ls-chars "DejaVu Sans"

更新的内容：

我了解到子外壳非常耗时（脚本中的printf子外壳）。因此，我设法编写了一个改进版本，速度提高了5到10倍！

#!/bin/sh
for range in $(fc-match --format='%{charset}\n' "$1"); do
    for n in $(seq "0x${range%-*}" "0x${range#*-}"); do
        printf "%04x\n" "$n"
    done
done | while read -r n_hex; do
    count=$((count + 1))
    printf "%-5s\U$n_hex\t" "$n_hex"
    [ $((count % 10)) = 0 ] && printf "\n"
done
printf "\n"

旧版本：

$ time ls-chars "DejaVu Sans" | wc
    592   11269   52740

real    0m2.876s
user    0m2.203s
sys     0m0.888s

新版本（行号表示5910个字符，在0.4秒内！）：

$ time ls-chars "DejaVu Sans" | wc
    592   11269   52740

real    0m0.399s
user    0m0.446s
sys     0m0.120s

更新结束

样本输出（它在我的st端子better中排列得更好）：

0020    0021 !  0022 "  0023 #  0024 $  0025 %  0026 &  0027 '  0028 (  0029 )
002a *  002b +  002c ,  002d -  002e .  002f /  0030 0  0031 1  0032 2  0033 3
0034 4  0035 5  0036 6  0037 7  0038 8  0039 9  003a :  003b ;  003c <  003d =
003e >  003f ?  0040 @  0041 A  0042 B  0043 C  0044 D  0045 E  0046 F  0047 G
...
1f61a? 1f61b? 1f61c? 1f61d? 1f61e? 1f61f? 1f620? 1f621? 1f622? 1f623?
1f625? 1f626? 1f627? 1f628? 1f629? 1f62a? 1f62b? 1f62d? 1f62e? 1f62f?
1f630? 1f631? 1f632? 1f633? 1f634? 1f635? 1f636? 1f637? 1f638? 1f639?
1f63a? 1f63b? 1f63c? 1f63d? 1f63e? 1f63f? 1f640? 1f643?

[1]似乎\U中的printf不是POSIX标准？

Answer 6

我遇到了同样的问题，并使HOWTO更进了一步，烘焙了所有受支持的Unicode代码点的正则表达式。

如果您只想要一系列代码点，则可以在运行ttx之后偷看Chrome devtools中的ttx -t cmap myfont.ttf xml时使用此功能，并且可能会将myfont.ttx重命名为{{ 1}}调用Chrome的xml模式：

myfont.xml

（同样依赖于gilamesh建议中的function codepoint(node) { return Number(node.nodeValue); } $x('//cmap/*[@platformID="0"]/*/@code').map(codepoint);; fonttools如果您使用的是ubuntu系统。）

Answer 7

fontconfig命令可以将字形列表输出为范围的紧凑列表，例如：

$ fc-match --format='%{charset}\n' OpenSans
20-7e a0-17f 192 1a0-1a1 1af-1b0 1f0 1fa-1ff 218-21b 237 2bc 2c6-2c7 2c9
2d8-2dd 2f3 300-301 303 309 30f 323 384-38a 38c 38e-3a1 3a3-3ce 3d1-3d2 3d6
400-486 488-513 1e00-1e01 1e3e-1e3f 1e80-1e85 1ea0-1ef9 1f4d 2000-200b
2013-2015 2017-201e 2020-2022 2026 2030 2032-2033 2039-203a 203c 2044 2070
2074-2079 207f 20a3-20a4 20a7 20ab-20ac 2105 2113 2116 2120 2122 2126 212e
215b-215e 2202 2206 220f 2211-2212 221a 221e 222b 2248 2260 2264-2265 25ca
fb00-fb04 feff fffc-fffd

将fc-query用于.ttf文件，将fc-match用于已安装的字体名称。

这可能不涉及安装任何额外的软件包，也不涉及翻译位图。

使用fc-match --format='%{file}\n'检查是否匹配了正确的字体。

Answer 8

要添加到@Oliver Lew 的答案中，我添加了查询本地字体而不是系统字体的选项：

#!/bin/bash

# If the first argument is a font file, use fc-match instead of fc-query to
# display the font
[[ -f "$1" ]] && fc='fc-query' || fc='fc-match'

for range in $($fc --format='%{charset}\n' "$1"); do
    for n in $(seq "0x${range%-*}" "0x${range#*-}"); do
        printf "%04x\n" "$n"
    done
done | while read -r n_hex; do
    count=$((count + 1))
    printf "%-5s\U$n_hex\t" "$n_hex"
    [ $((count % 10)) = 0 ] && printf "\n"
done
printf "\n"

Answer 9

如果您只想“查看”字体，则以下内容可能会有所帮助（如果您的终端支持相关字体）：

var Twit = require ('twit');

var config = require('./config');
var T = new Twit(config);

   for (var i = 0; i < tweetList.length; i++) {
    var id = { id: tweetList[i].id_str }
    if ('retweeted_status' in tweetList[i]) 
        continue;
    }
    var message = "A lot confused, a lot does not understand feelings";
    var tweetId = tweetList[i].id_str

    try {
        T.post('statuses/update',
            { "status": message, "in_reply_to_status_id": tweetId },
            function (error, tweets, response) {
                console.log("Tweet posted successfully!")
            });
    }
    catch (err) {
        console.log(err);
    }  
});

一种不安全但简单的查看方式：

#!/usr/bin/env python
import sys
from fontTools.ttLib import TTFont

with TTFont(sys.argv[1], 0, ignoreDecompileErrors=True) as ttf:
    for x in ttf["cmap"].tables:
        for (_, code) in x.cmap.items():
            point = code.replace('uni', '\\u').lower()
            print("echo -e '" + point + "'")

感谢Janus（https://stackoverflow.com/a/19438403/431528）提供了上述答案。

Answer 10

以上Janus的答案（https://stackoverflow.com/a/19438403/431528）有效。但是python太慢了，特别是对于亚洲字体。在E5计算机上，文件大小为40MB的字体需要花费几分钟的时间。

所以我写了一些C ++程序来做到这一点。它取决于FreeType2（https://www.freetype.org/）。这是一个vs2015项目，但由于它是一个控制台应用程序，因此很容易移植到linux。

代码可以在这里找到，https://github.com/zhk/AllCodePoints 对于40MB文件大小的亚洲字体，在我的E5计算机上花费大约30毫秒。

Answer 11

如果要获得字体支持的所有字符，则可以使用以下命令（基于Janus的回答）

from fontTools.ttLib import TTFont

def get_font_characters(font_path):
    with TTFont(font_path) as font:
        characters = {chr(y[0]) for x in font["cmap"].tables for y in x.cmap.items()}
    return characters

Answer 12

FreeType的项目提供了演示应用程序，其中的一个演示称为“ ftdump”。然后，您可以执行以下操作：“ ftdump -V指向字体文件的路径”，您将获得所需的内容。要查看源代码，可以在此处关闭源代码：https://www.freetype.org/developer.html

在Ubuntu上，可以使用“ sudo apt install freetype2-demos”进行安装

注意：尝试使用“ -c”而不是“ -V”。我看到版本之间的args有所更改。

Answer 13

您可以使用Font::TTF模块在Perl的Linux上执行此操作。

找出给定字体支持的字符

13 个答案: