Question

我尝试以下代码

j <- "*Politics:* Disgraced peer Jeffrey Archer is set to make \xa31m from his Belmarsh "
nchar(j)
# Error in nchar(j) : invalid multibyte string 1

正如您所见，我无法使用nchar（）。我该如何解决这个问题？

Answer 1

如果你知道具体的编码，你可以使用iconv转换为更好用的东西

j <- "*Politics:* Disgraced peer Jeffrey Archer is set to make \xa31m from his Belmarsh "
iconv(j, "ISO-8859-1", "UTF-8")
#[1] "*Politics:* Disgraced peer Jeffrey Archer is set to make £1m from his Belmarsh "
nchar(iconv(j, "ISO-8859-1", "UTF-8"))
#[1] 79

我将您的文字写入文件，并使用geany检查编码，这是我到达ISO-8859-1的方式。

不需要您计算编码的替代路线是使用type="bytes"而不是手动转换为UTF-8

nchar(j, type = "bytes")
#[1] 79

我建议在nchar ?nchar上阅读帮助文件，因为默认类型和type =“bytes”之间存在细微差别。

Answer 2

如果Dason是正确的......

我只有一种方法可以做到这一点，它需要用readLines读取每个字符串：

x <- readLines(n=2)
*Politics:* Disgraced peer Jeffrey Archer is set to make \xa31m from his Belmarsh 
df vetf tefer\x vtgr
nchar(x)

n = 2告诉R你正在阅读2行。然后读入它们（我在rgui中使用 contr + r 或在R studio中使用 cntrl + 输入）。然后，您可以使用nchar

如何计算R中的nchar？

2 个答案: