Question

我有一个小的elisp脚本，它在区域或整个文件上应用Perl :: Tidy。作为参考，这是脚本（从EmacsWiki借来）：

(defun perltidy-command(start end)
"The perltidy command we pass markers to."
(shell-command-on-region start 
                       end 
                       "perltidy" 
                       t
                       t
                       (get-buffer-create "*Perltidy Output*")))

(defun perltidy-dwim (arg)
"Perltidy a region of the entire buffer"
(interactive "P")
(let ((point (point)) (start) (end))
(if (and mark-active transient-mark-mode)
    (setq start (region-beginning)
          end (region-end))
  (setq start (point-min)
        end (point-max)))
(perltidy-command start end)
(goto-char point)))

(global-set-key "\C-ct" 'perltidy-dwim)

我正在使用当前用于Windows的Emacs 23.1（EmacsW32）。我遇到的问题是，如果我将该脚本应用于UTF-8编码文件（状态栏中的“U（Unix）”），则输出将返回Latin-1编码，即每个非两个或更多字符ASCII源字符。

有什么方法可以解决这个问题吗？

编辑：我的(set-terminal-coding-system 'utf-8-unix)中使用init.el似乎解决了问题。在任何人都有其他解决方案，请继续写下来！

Answer 1

引用shell-command-on-region（C-h f shell-command-on-region RET）的文档：

指定用于转换非ASCII字符的编码系统   在输入和输出到shell命令中，使用C-x RET c   在此命令之前。默认情况下，输入（来自当前缓冲区）   被编码在用于保存文件的相同编码系统中，   `缓冲文件编码系统”。如果输出将替换该区域，   然后它从相同的编码系统解码。

非交互式参数是START，END，COMMAND，   输出缓冲器，替换，错误缓冲器和显示错误缓冲器。   非交互式呼叫者可以通过绑定来指定编码系统   `coding-system-for-read'和`coding-system-for-write'。

换句话说，你会做类似

的事情

(let ((coding-system-for-read 'utf-8-unix))
  (shell-command-on-region ...) )

这是未经测试的，不确定coding-system-for-read（或者可能是-write）的价值是什么？或者也是如此？）应该在您的情况下。我猜你也可以使用OUTPUT-BUFFER参数并将输出定向到一个缓冲区，其编码系统设置为你需要的。

另一种选择可能是在perltidy调用中摆动语言环境，但是再一次，如果没有关于你现在使用的内容的更多信息，并且无法在类似于你的系统上进行实验，我只能提示。

Answer 2

以下是shell-command-on-region文件

To specify a coding system for converting non-ASCII characters
in the input and output to the shell command, use C-x RET c
before this command.  By default, the input (from the current buffer)
is encoded using coding-system specified by `process-coding-system-alist',
falling back to `default-process-coding-system' if no match for COMMAND
is found in `process-coding-system-alist'.

在执行期间，它首先从process-coding-system-alist查找编码系统，如果它是零，则从default-process-coding-system查找。

如果您想更改编码，可以将转换选项添加到process-coding-system-alist，以下是其内容。

Value: (("\\.dz\\'" no-conversion . no-conversion)
 ...
("\\.elc\\'" . utf-8-emacs)
("\\.utf\\(-8\\)?\\'" . utf-8)
("\\.xml\\'" . xml-find-file-coding-system)
 ...
("" undecided))

或者，如果您未设置process-coding-system-alist，则为零，您可以将编码选项指定给default-process-coding-system，

例如：

(setq default-process-coding-system '(utf-8 . utf-8))

（如果输入编码为utf-8，则输出编码为utf-8）

或者

(setq default-process-coding-system '(undecided-unix . iso-latin-1-unix))

如果您需要详细信息，我还会写一个post。

如何设置shell-command-on-region输出的编码？

2 个答案: