R,data.table,utf-8%in%vs == problem

时间:2016-11-28 20:49:28

标签: r encoding data.table match

我在data.table和%in%运算符中有奇怪的行为。 我在utf-8标题中加载带有俄文字母的data.table。

d = fread(filename, sep="\t", encoding="UTF-8", verbose=TRUE)
bar=names(d)
bar

 [1] "Дата, Время"      "Состояние"        "Ia, A"            "Ib, A"            "Ic, A"           
 [6] "Дисб.I"           "акт.P,кВт"        "P, кВА"           "cos"              "Загр., %"        
[11] "Uвх.AB,В"         "Uвх.BC,В"         "Uвх.CA,В"         "Дисб. U, %"       "R, кОм"          
[16] "F Турб.вращ.,Гц"  "Приток,куб.м/cут" "Отбор,куб.м/cут"  "P, ат."           "Расход, куб.м/c" 
[21] "Tдвиг, °C"        "Tжид, °C"         "Pвыкид, ат."      "Tвыкид, °C"       "Вибр X/Y, м/с2"  
[26] "Вибр Z, м/с2"     "Pвыс.р, ат."      "Iутеч, мA"        "Tобм, °C"         "Акт.энерг,кВт"   
[31] "Реакт.энерг,кВАр" "Вход1,ед."        "Вход2,ед."        "Вход3,ед."        "Вход4,ед."       
[36] "Вход5,ед."        "Вход6,ед."        "Вход7,ед."        "Вход8,ед."        "Статусн.сообщ."

我在代码中有一个硬编码的值

foo="Uвх.AB,В"

尝试执行以下操作

if (foo %in bar) { ... } 

惊喜

foo %in% bar

[1] FALSE

但是

foo==bar

 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE  TRUE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE

注意第11位的TRUE,原因在于编码

Encoding(foo)

[1] "UTF-8"

Encoding(bar)

 [1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[10] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[19] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[28] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[37] "unknown" "unknown" "unknown" “unknown"

在data.table上代表它有点奇怪,因为我在fread上问了encoding =“UTF-8”。 另一方面%in%匹配与==的行为差异也很奇怪。

我感觉到宇宙的错误,有人可以解释一下为什么%in%以奇怪的方式使用编码以及使用它的正确方法是什么?

0 个答案:

没有答案