我刚才意识到他们的行为不同,我想知道这是否是故意的。正如here所述,由于编码至关重要,因此可能无法提供良好的可重现性示例:
a <- c("a", "b", "ä", "ü", "ö")
print(a)
# [1] "a" "b" "ä" "ü" "ö"
print(Encoding(a))
# [1] "unknown" "unknown" "latin1" "latin1" "latin1"
a <- enc2utf8(a)
print(Encoding(a))
# [1] "unknown" "unknown" "UTF-8" "UTF-8" "UTF-8"
match("ä", a)
# [1] NA # This is what I did not expect...
charmatch("ä", a)
# [1] 3 # ok
grepl("ä", a)
# [1] FALSE FALSE TRUE FALSE FALSE # ok
match
文档仅说明
如果有任何输入,字符串将作为字节序列进行比较 标记为“字节”(请参阅编码)。
评论编码:我也尝试了以下问题
a <- c("a", "b", "c", "d", "e")
print(a)
# [1] "a" "b" "c" "d" "e"
print(Encoding(a))
# [1] "unknown" "unknown" "unknown" "unknown" "unknown"
match("a", a)
# [1] 1 # ok!!!
charmatch("a", a)
# [1] 1 # ok
grepl("a", a)
# [1] TRUE FALSE FALSE FALSE FALSE # ok
修改:
Sys.getlocale()
[1] "LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;
LC_MONETARY=German_Switzerland.1252;LC_NUMERIC=C LC_TIME=German_Switzerland.1252"
编辑2 :
我刚刚意识到效果只出现在
之后a <- enc2utf8(a)
print(Encoding(a))
# [1] "unknown" "unknown" "UTF-8" "UTF-8" "UTF-8"
我更改了上面的代码。