Question

在NodeJS中，Clean Project和child_process.execFile采用以下参数：

.spawn字符串参数列表。

NodeJS如何对在此数组中传递的字符串进行编码？

上下文：我正在编写一个nodejs应用程序，该应用程序将元数据（通常包括非ASCII字符）添加到mp3中。

我知道args <string[]>需要utf8编码的参数。如果我的nodejs应用调用ffmpeg，那么nodejs将如何在参数中编码child_process.execFile("ffmpeg",["-metadata","title="+myString], {encoding:"utf8")？
我知道myString需要使用latin1编码的参数。如果我的nodejs应用调用id3v2，那么nodejs将如何在参数中编码child_process.execFile("id3v2",["--titl",myString], {encoding:"latin1")？

我看到myString和execFile都带有“编码”参数。但是nodejs文档说：“可以使用encoding选项指定用于解码stdout和stderr输出的字符编码。”文档对spawn的编码一无所知。

Answer 1

答案：NodeJS始终将args编码为UTF-8。

我编写了一个简单的C ++应用程序，它显示了传递到其argv中的字节的原始事实：

#include <stdio.h>

int main(int argc, char *argv[])
{
  printf("argc=%u\n", argc);
  for (int i = 0; i < argc; i++)
  {
    printf("%u:\"", i);
    for (char *c = argv[i]; *c != 0; c++)
    {
      if (*c >= 32 && *c < 127)
        printf("%c", *c);
      else
      {
        unsigned char d = *(unsigned char *)c;
        unsigned int e = d;
        printf("\\x%02X", e);
      }
    }
    printf("\"\n");
  }
  return 0;
}

在我的NodeJS应用程序中，我得到了一些字符串，我可以肯定地知道它们的来源：

const a = Buffer.from([65]).toString("utf8");
const pound = Buffer.from([0xc2, 0xa3]).toString("utf8");
const skull = Buffer.from([0xe2, 0x98, 0xa0]).toString("utf8");

const pound2 = Buffer.from([0xa3]).toString("latin1");

toString的参数指示应将缓冲区中的原始字节理解为缓冲区是UTF-8（或最后一种情况是latin1）。结果是我有四个字符串，它们的内容我明确知道是正确的。

（我了解Javascript VM通常将其字符串存储为UTF16吗？在我的实验中pound和pound2表现相同的事实证明，字符串的来源无关紧要。）

最后，我使用以下字符串调用了execFile：

child_process.execFileAsync("argcheck",[a,pound,pound2,skull],{encoding:"utf8"});
child_process.execFileAsync("argcheck",[a,pound,pound2,skull],{encoding:"latin1"});

在两种情况下，nodejs传递给argv的原始字节都是字符串a，pound，pound2，skull的UTF-8编码。

那么我们如何从nodejs传递latin1参数呢？

以上说明表明，nodejs不可能将范围为127..255的任何latin1字符传递给child_process.spawn / execFile。但是有一个涉及child_process.exec的逃生舱口：

示例：此字符串“ A£☠”
在Javascript的UTF16中内部存储为“ \ u0041 \ u00A3 \ u2620”
在UTF-8中编码为“ \ x41 \ xC2 \ xA3 \ xE2 \ x98 \ xA0”
在latin1中编码为“ \ x41 \ xA3吗？” （在latin1中，骷髅和交叉骨无法表达）
Unicode字符0-127与latin1相同，并编码为utf-8，与latin1相同
Unicode字符128-255与latin1相同，但编码方式不同
latin1 /中不存在256个以上的Unicode字符。

// this would encode them as utf8, which is wrong:
execFile("id3v2", ["--comment", "A £ ☠", "x.mp3"]);

// instead we'll use shell printf to bypass nodejs's wrongful encoding:
exec("id3v2 --comment \"`printf "A \xA3 ?"`\" x.mp3");

这是将“ A£to”之类的字符串转换为“ A \ xA3？”之类的方便方法，准备传递给child_process.exec：

const comment2 = [...comment]
  .map(c =>
    c <= "\u007F" ? c : c <= "\u00FF"
    ? `\\x${("000" + c.charCodeAt(0).toString(16)).substr(-2)}` : "?")
    )
  .join("");

const cmd = `id3v2 --comment \"\`printf \"${comment2}\"\`\" \"${fn}\"`;

child_process.exec(cmd, (e, stdout, stderr) => { ... });

nodejs对child_process.spawn和child_process.execFile中的参数使用哪种编码？

1 个答案:

那么我们如何从nodejs传递latin1参数呢？