Question

在我的数据表中，我希望能够为每条记录添加某种“标志”（错误消息），并将其保存在一列中（不是每个标志的单独列）。我实施了一些解决方案，但我不喜欢它，并且会对如何更好地完成这一任务的任何建议表示感谢。我决定将我的标志转换为位掩码并在我的表中存储一个整数。因此，我必须单独列出数字和错误文本之间的对应关系。为了添加标志，我创建了一个函数，它接受条件（即哪些记录应该被标记），错误文本和包含现有错误的变量名称。

# sample table:
dt1 <- data.table(colA=c(2,5,4,8), 
                  colB=c('foo', 'bar', 'xyz', 'qwe') )
dt1
#> dt1
#   colA colB
#1:    2  foo
#2:    5  bar
#3:    4  xyz
#4:    8  qwe

# main function which adds a flag:
# dtIn      - data.table containing our records
# condition - expression defines which records will be flagged
# flagtxt   - text of error message
# inpArrName- name of the variable which contains existing flags list
adderrb <- function(dtIn, condition, flagtxt, inpArrName="flagsList"){
  wEnv <- globalenv(); # environment needed to modify variable with flags list

  if (exists(inpArrName, envir=wEnv )){ # check if variable with flags list is defined
    inpArr <- get(inpArrName, envir = wEnv); # then use it
  } else {inpArr <- array(dim=0);} # otherwise create it as an empty array

  newval <- match(flagtxt, inpArr); # trying to find requested flag text in current list, to avoid duplication

  if (length(newval)==0 || is.na(newval)) { # if this flag text isn't in list
    newval <- length(inpArr)+1; # we will add it as new member
    inpArr[newval] <- flagtxt;
    assign(inpArrName, inpArr, envir=wEnv); # modify variable in parent (global) variable
  }

  # adding flags to the 'errors' column of our table
  eval.parent(substitute({ # 
    if (!is.element('errors', names(dtIn))) {dtIn[,errors:=0L];} # if this column did not exist, create it, initialize with 0L
    dtIn[condition, errors:=bitwOr(errors, 2^(newval-1))]; # for records which meet condition, we add new flad to existing mask
    invisible(dtIn);
  })); # 

}

# example of usage:
adderrb(dt1, colA>3,      "A>3!",   "errorList")
adderrb(dt1, colB=="xyz", "B=xyz!", "errorList")
dt1
#> dt1
#   colA colB errors
#1:    2  foo      0
#2:    5  bar      1
#3:    4  xyz      3
#4:    8  qwe      1

# for human-readable form, we can later convert bitmask to list of errors - as their numbers, or as their text:
# for this, I use a helper function converting bitmask from number to list of binary orders,
# for example: 
# 2L = '010', so int2bitsV(4) = 2 
# 4L = '100', so int2bitsV(4) = 3 
# 6L = '110', so int2bitsV(4) = 2, 3 
int2bitsV <- function(x) { which(as.logical(intToBits(x)), T);}

# here's how it works
dt1$errList <- sapply(dt1$errors, function(x){paste0("e", int2bitsV(x),  collapse = "; ");})
dt1$errText <- sapply(dt1$errors, function(x){errorList[int2bitsV(x)]})
dt1
#> dt1
#   colA colB errors errList     errText
#1:    2  foo      0       e            
#2:    5  bar      1      e1        A>3!
#3:    4  xyz      3  e1; e2 A>3!,B=xyz!
#4:    8  qwe      1      e1        A>3!

总的来说，这可以按照我的意愿运行，但我不喜欢我不得不混淆globalenv()，也不确定我如何使用位掩码 - 也许有更好的方法来存储标志？

位图掩码（避免）通过引用传递 - 需要建议进行代码优化

0 个答案: