如何找到"配对角色"在perl?

时间:2015-10-23 16:08:04

标签: perl unicode

可以找到所有"配对"字符编程?

E.g。当我得到<字符时如何找到通讯员&#34;对&#34; >到它?

以下代码打印每个&#34;镜像&#34; ascii characters。

use 5.018;
use warnings;
use charnames qw(:full);
for my $n (0..127) {
    my $c = chr $n;
    printf "%02x: [%s] - %s\n", $n, $c, charnames::viacode($n) if $c =~ /\p{Bidi_Mirrored=Y}/;
}

打印:

28: [(] - LEFT PARENTHESIS
29: [)] - RIGHT PARENTHESIS
3c: [<] - LESS-THAN SIGN
3e: [>] - GREATER-THAN SIGN
5b: [[] - LEFT SQUARE BRACKET
5d: []] - RIGHT SQUARE BRACKET
7b: [{] - LEFT CURLY BRACKET
7d: [}] - RIGHT CURLY BRACKET

但AFAIK Bidi_Mirrored属性与&#34;配对&#34;相同例如左右对,因为例如以下字符具有Bidi_Mirrored属性,但可能没有任何&#34;对&#34;。

∰  U+02230 VOLUME INTEGRAL

如果&{34;配对&#34; Bidi_Mirrored属性正确人物,问题仍然是一样的:如何找到对#&#34; s&#34;码点? (或名字)?

简而言之:想要打印所有unicode&#34;配对&#34;字符,例如对像:

«  U+000AB LEFT-POINTING DOUBLE ANGLE QUOTATION MARK
»  U+000BB RIGHT-POINTING DOUBLE ANGLE QUOTATION MARK

≤  U+02264 LESS-THAN OR EQUAL TO
≥  U+02265 GREATER-THAN OR EQUAL TO

等...

修改

同时这个问题已经结束,所以在这里写下我的发现:

我发现here以下内容:

# Bidi_Paired_Bracket is a normative property of type Miscellaneous,
# which establishes a mapping between characters that are treated as
# bracket pairs by the Unicode Bidirectional Algorithm.
#
# Bidi_Paired_Bracket_Type is a normative property of type Enumeration,
# which classifies characters into opening and closing paired brackets
# for the purposes of the Unicode Bidirectional Algorithm.
#
# This file lists the set of code points with Bidi_Paired_Bracket_Type
# property values Open and Close. The set is derived from the character
# properties General_Category (gc), Bidi_Class (bc), Bidi_Mirrored (Bidi_M),
# and Bidi_Mirroring_Glyph (bmg), as follows: two characters, A and B,
# form a bracket pair if A has gc=Ps and B has gc=Pe, both have bc=ON and
# Bidi_M=Y, and bmg of A is B. Bidi_Paired_Bracket (bpb) maps A to B and
# vice versa, and their Bidi_Paired_Bracket_Type (bpt) property values are
# Open (o) and Close (c), respectively.
#
# For legacy reasons, the characters U+FD3E ORNATE LEFT PARENTHESIS and
# U+FD3F ORNATE RIGHT PARENTHESIS do not mirror in bidirectional display
# and therefore do not form a bracket pair.
#
# The Unicode property value stability policy guarantees that characters
# which have bpt=o or bpt=c also have bc=ON and Bidi_M=Y. As a result, an
# implementation can optimize the lookup of the Bidi_Paired_Bracket_Type
# property values Open and Close by restricting the processing to characters
# with bc=ON

看起来,这里存在确切的算法,但我不知道如何获得Bidi_Mirroring_Glyph又名(bmg)Bidi_Paired_Bracket又名{ perl中的{1}} 。 AFAIK Unicode::UCD不包含这些值 - 或者至少我不知道如何获取它们。

也许在(bpb)和Unicode 8.0中? :):)

0 个答案:

没有答案