match,charmatch和grepl之间的区别

时间:2017-07-11 07:54:30

标签: r

我刚才意识到他们的行为不同,我想知道这是否是故意的。正如here所述,由于编码至关重要,因此可能无法提供良好的可重现性示例:

a <- c("a", "b", "ä", "ü", "ö")
print(a)
# [1] "a" "b" "ä" "ü" "ö"
print(Encoding(a))
# [1] "unknown" "unknown" "latin1"  "latin1"  "latin1"
a <- enc2utf8(a)
print(Encoding(a))
# [1] "unknown" "unknown" "UTF-8"   "UTF-8"   "UTF-8" 

match("ä", a)
# [1] NA # This is what I did not expect...
charmatch("ä", a)
# [1] 3 # ok
grepl("ä", a)
# [1] FALSE FALSE  TRUE FALSE FALSE # ok

match文档仅说明

  

如果有任何输入,字符串将作为字节序列进行比较   标记为“字节”(请参阅​​编码)。

评论编码:我也尝试了以下问题

a <- c("a", "b", "c", "d", "e")
print(a)
# [1] "a" "b" "c" "d" "e"
print(Encoding(a))
# [1] "unknown" "unknown" "unknown" "unknown" "unknown"
match("a", a)
# [1] 1 # ok!!!
charmatch("a", a)
# [1] 1 # ok
grepl("a", a)
# [1]  TRUE FALSE FALSE FALSE FALSE # ok

修改

Sys.getlocale()
[1] "LC_COLLATE=German_Switzerland.1252;LC_CTYPE=German_Switzerland.1252;
     LC_MONETARY=German_Switzerland.1252;LC_NUMERIC=C LC_TIME=German_Switzerland.1252"

编辑2

我刚刚意识到效果只出现在

之后
a <- enc2utf8(a)
print(Encoding(a))
# [1] "unknown" "unknown" "UTF-8"   "UTF-8"   "UTF-8" 

我更改了上面的代码。

0 个答案:

没有答案