标准化Elixir / Phoenix中的字符串

时间:2016-01-10 05:58:40

标签: elixir phoenix-framework

我希望规范化用户通过<form>发布的Unicode(UTF-8)字符串。是否有任何图书馆在Elixir(或凤凰城或Erlang)处理这些东西?我习惯在Python中这样做,但我不知道Elixir有那些库。

import unicodedata
import zenhan
import jctconv

def normalize(strings, unistr = 'NFKC')
    norm = unicodedata.normalize(unistr, strings)
    zenhan = zenhan.z2h(norm, mode=2)
    katahira = jctconv.kata2hira(zenhan)

    return katahira

2 个答案:

答案 0 :(得分:3)

自Elixir 1.2以来,有一个String.normalize/2函数。我不确定那些python库在做什么,但是这个函数可能是你想要实现的目标的良好开端。

答案 1 :(得分:1)

如果您在h String.normalize内输入iex,您将获得正确的信息和一些示例。

Converts all characters in binary to Unicode normalization form 
identified by
form.

Forms

The supported forms are:

  • :nfd - Normalization Form Canonical Decomposition. Characters are
    decomposed by canonical equivalence, and multiple combining characters are
    arranged in a specific order.
  • :nfc - Normalization Form Canonical Composition. Characters are
    decomposed and then recomposed by canonical equivalence.

Examples

┃ iex> String.normalize("yêṩ", :nfd)
┃ "yêṩ"
┃
┃ iex> String.normalize("leña", :nfc)
┃ "leña"