我在data.table和%in%运算符中有奇怪的行为。 我在utf-8标题中加载带有俄文字母的data.table。
d = fread(filename, sep="\t", encoding="UTF-8", verbose=TRUE)
bar=names(d)
bar
[1] "Дата, Время" "Состояние" "Ia, A" "Ib, A" "Ic, A"
[6] "Дисб.I" "акт.P,кВт" "P, кВА" "cos" "Загр., %"
[11] "Uвх.AB,В" "Uвх.BC,В" "Uвх.CA,В" "Дисб. U, %" "R, кОм"
[16] "F Турб.вращ.,Гц" "Приток,куб.м/cут" "Отбор,куб.м/cут" "P, ат." "Расход, куб.м/c"
[21] "Tдвиг, °C" "Tжид, °C" "Pвыкид, ат." "Tвыкид, °C" "Вибр X/Y, м/с2"
[26] "Вибр Z, м/с2" "Pвыс.р, ат." "Iутеч, мA" "Tобм, °C" "Акт.энерг,кВт"
[31] "Реакт.энерг,кВАр" "Вход1,ед." "Вход2,ед." "Вход3,ед." "Вход4,ед."
[36] "Вход5,ед." "Вход6,ед." "Вход7,ед." "Вход8,ед." "Статусн.сообщ."
我在代码中有一个硬编码的值
foo="Uвх.AB,В"
尝试执行以下操作
if (foo %in bar) { ... }
惊喜
foo %in% bar
[1] FALSE
但是
foo==bar
[1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE
[17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
[33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
注意第11位的TRUE,原因在于编码
Encoding(foo)
[1] "UTF-8"
Encoding(bar)
[1] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[10] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[19] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[28] "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown" "unknown"
[37] "unknown" "unknown" "unknown" “unknown"
在data.table上代表它有点奇怪,因为我在fread上问了encoding =“UTF-8”。 另一方面%in%匹配与==的行为差异也很奇怪。
我感觉到宇宙的错误,有人可以解释一下为什么%in%以奇怪的方式使用编码以及使用它的正确方法是什么?