Question

对于Basic Multilingual Plane中的字符，我们可以使用'\ uxxxx'来逃避它。例如，你可以使用/ [\ u4e00- \ u9fff] /来匹配一个普通的中文字符（0x4e00-0x9fff是CJK统一表意文字的范围）。

但是对于Basic Multilingual Plane中的字符，它们的代码大于0xffff。所以你不能使用格式'\ uxxxx'来逃避它，因为'\ u20000'表示字符'\ u2000'和字符'0'，而不是代码为0x20000的字符。

如何从基本多语言平面中转义角色？直接使用这些字符并不是一个好主意，因为它们无法以大多数字体显示。

Answer 1

JMP不能直接识别BMP之外的字符 - 它们在内部表示为UTF-16代理对。例如，您提到的字符U + 20000（当前分配给“CJK Unified Ideographs Ext.B”）表示为代理对U + D840 U + DC00。作为Javascript字符串，这只是"\u2840\uDC00"。（请注意，s.length对于此字符串为2，即使它显示为单个字符。）

Wikipedia has details on the encoding scheme used.

Answer 2

您可以使用一对转义的代理代码点，如@ duskwuff的回答中所述。您可以使用我的Full Unicode input utility来获取符号（按钮“Show \ u”），或使用Fileformat.info character search找出它们（项目“C / C ++ / Java源代码”，因为JavaScript在这里使用相同的表示法。）

或者，您可以直接输入字符：“您可以在JavaScript代码中将字符串文字中的非BMP字符输入，无论是在单独的文件中还是嵌入在HTML中。当然，您需要在您使用的编辑器中使用合适的Unicode支持。但JavaScript实现不需要在程序源中支持非BMP字符。他们可能和现代浏览器实现一样。“（Going Global with JavaScript and Globalize.js，p.177）有一些注意事项，如正确声明字符编码。

字体支持是一个不同的问题，但在使用字符时，您通常希望在某些时候看到它们，至少在测试中。所以你或多或少需要一些覆盖角色的字体。 Fileformat.info页面还包含指向浏览器支持信息的链接，例如(U+20000) Font Support - 一个很好的起点，虽然不是很完整。例如，SimSun-ExtB
也支持U + 20000 ''

Answer 3

有趣的问题。

现在我们有ES6，we can do this：

public class SinglyLinkedList<E> {


private static class Node<E>{
    private E element;
    private Node<E> next;
    public Node(E e, Node<E> n){
        element = e;
        next = n;
    }
    public E getElement(){
        return element;
    }
    public Node<E> getNext(){
        return next;
    }
    public void setNext(Node<E> n){
        next = n;
    }
}

private Node<E> head = null;           //head node of list or null if empty
private Node<E> tail = null;            // tail node of list or null if empty
private int size = 0;           //size of list

public SinglyLinkedList(){}     //constructs an initially empty list

public int size(){              //size getter
    return size;
}

//accessors
public boolean isEmpty(){       //isList empty getter
    return size ==0;
}

public E first(){               //head data getter
    if(isEmpty()){
        return null;
    }
    return head.getElement();
}

public E last(){                //tail data getter
    if(isEmpty()){
        return null;
    }
    return tail.getElement();
}

//updators
public void addFirst(E e){
    head = new Node<>(e, head);
    if(size == 0){
        tail = head;
    }
    size++;
}

public void addLast(E e){
    Node<E> newest = new Node<>(e, null);
    if(isEmpty()){
        head = newest;
    }
    else{
        tail.setNext(newest);
    }
    tail = newest;
    size++;
}

public E removeFirst(){
    if(isEmpty()) return null;

    E answer = head.getElement();
    head = head.getNext();
    size--;
    if(size==0)
        tail = null;
    return answer;
}
}

请注意，内部仍UTF-16 surrogate pairs：

let newSpeak = '\u{1F4A9}'

Unicode is huge

此外，它不仅仅是文字：

newSpeak.length === 2 // "wrong"
[...newSpeak].length === 1
newSpeak === '\uD83D\uDCA9'

我正在处理Unicode。

如何从基本多语言平面中逃脱角色？

3 个答案: