我希望规范化用户通过<form>
发布的Unicode(UTF-8)字符串。是否有任何图书馆在Elixir(或凤凰城或Erlang)处理这些东西?我习惯在Python中这样做,但我不知道Elixir有那些库。
import unicodedata
import zenhan
import jctconv
def normalize(strings, unistr = 'NFKC')
norm = unicodedata.normalize(unistr, strings)
zenhan = zenhan.z2h(norm, mode=2)
katahira = jctconv.kata2hira(zenhan)
return katahira
答案 0 :(得分:3)
自Elixir 1.2以来,有一个String.normalize/2
函数。我不确定那些python库在做什么,但是这个函数可能是你想要实现的目标的良好开端。
答案 1 :(得分:1)
如果您在h String.normalize
内输入iex
,您将获得正确的信息和一些示例。
Converts all characters in binary to Unicode normalization form
identified by
form.
Forms
The supported forms are:
• :nfd - Normalization Form Canonical Decomposition. Characters are
decomposed by canonical equivalence, and multiple combining characters are
arranged in a specific order.
• :nfc - Normalization Form Canonical Composition. Characters are
decomposed and then recomposed by canonical equivalence.
Examples
┃ iex> String.normalize("yêṩ", :nfd)
┃ "yêṩ"
┃
┃ iex> String.normalize("leña", :nfc)
┃ "leña"