用elisp替换逃逸的unicode

时间:2013-10-29 06:45:21

标签: regex emacs unicode elisp

通过在emacs中调用google dictionary api,     http://www.google.com/dictionary/json?callback=cb&q=word&sl=en&tl=en&restrict=pr%%2Cde&client=te 我可以得到如下的回复

"entries": [{
    "type": "example",
    "terms": [{
        "type": "text",
        "text": "his grandfather\x27s \x3cem\x3ewords\x3c/em\x3e had been meant kindly",
        "language": "en"
    }]
}]

正如您所看到的,“text”中存在转义的unicode。我想在下面的函数中转换它们。

(defun unescape-string (string)
    "Return unescape unicode string"
    ...
)
(unescape-string "his grandfather\x27s \x3cem\x3ewords\x3c/em\x3e")
=> "his grandfathers's <em>words</em>"

(insert #x27)'
(insert #x27)'
(insert #x3c)<
(insert #x3e)>

这是我试过的

但是,我想我不知道如何用相应的unicode将'\ x123'替换成缓冲区或字符串。

提前致谢

1 个答案:

答案 0 :(得分:2)

似乎是最简单的方法:

(read (princ "\"his grandfather\\x27s \\x3cem\\x3ewords\\x3c/em\\x3e had been meant kindly\""))
;; "his grandfather's ώm>words</em> had been meant kindly"

同样有趣的是,Emacs解析\x3ce而不是\x3c。我不确定这是一个错误或预期的行为。我一直认为在x ...

之后不应该读两个以上的字符

如果您仍想使用read + princ组合,则需要使用反斜杠来阻止Emacs解析更多字符,例如:\x3c\e。或者我可以想出一些快速的东西:

(defun replace-c-escape-codes (input)
  (replace-regexp-in-string 
   "\\\\x[[:xdigit:]][[:xdigit:]]"
   (lambda (match)
     (make-string 1 (string-to-number (substring match 2) 16)))
   input))

(replace-c-escape-codes "his grandfather\\x27s \\x3cem\\x3ewords\\x3c/em\\x3e")
"his grandfather's <em>words</em>"