位图掩码(避免)通过引用传递 - 需要建议进行代码优化

时间:2015-04-05 05:31:58

标签: r data.table

在我的数据表中,我希望能够为每条记录添加某种“标志”(错误消息),并将其保存在一列中(不是每个标志的单独列)。我实施了一些解决方案,但我不喜欢它,并且会对如何更好地完成这一任务的任何建议表示感谢。我决定将我的标志转换为位掩码并在我的表中存储一个整数。因此,我必须单独列出数字和错误文本之间的对应关系。为了添加标志,我创建了一个函数,它接受条件(即哪些记录应该被标记),错误文本和包含现有错误的变量名称。

# sample table:
dt1 <- data.table(colA=c(2,5,4,8), 
                  colB=c('foo', 'bar', 'xyz', 'qwe') )
dt1
#> dt1
#   colA colB
#1:    2  foo
#2:    5  bar
#3:    4  xyz
#4:    8  qwe

# main function which adds a flag:
# dtIn      - data.table containing our records
# condition - expression defines which records will be flagged
# flagtxt   - text of error message
# inpArrName- name of the variable which contains existing flags list
adderrb <- function(dtIn, condition, flagtxt, inpArrName="flagsList"){
  wEnv <- globalenv(); # environment needed to modify variable with flags list

  if (exists(inpArrName, envir=wEnv )){ # check if variable with flags list is defined
    inpArr <- get(inpArrName, envir = wEnv); # then use it
  } else {inpArr <- array(dim=0);} # otherwise create it as an empty array

  newval <- match(flagtxt, inpArr); # trying to find requested flag text in current list, to avoid duplication

  if (length(newval)==0 || is.na(newval)) { # if this flag text isn't in list
    newval <- length(inpArr)+1; # we will add it as new member
    inpArr[newval] <- flagtxt;
    assign(inpArrName, inpArr, envir=wEnv); # modify variable in parent (global) variable
  }

  # adding flags to the 'errors' column of our table
  eval.parent(substitute({ # 
    if (!is.element('errors', names(dtIn))) {dtIn[,errors:=0L];} # if this column did not exist, create it, initialize with 0L
    dtIn[condition, errors:=bitwOr(errors, 2^(newval-1))]; # for records which meet condition, we add new flad to existing mask
    invisible(dtIn);
  })); # 

}

# example of usage:
adderrb(dt1, colA>3,      "A>3!",   "errorList")
adderrb(dt1, colB=="xyz", "B=xyz!", "errorList")
dt1
#> dt1
#   colA colB errors
#1:    2  foo      0
#2:    5  bar      1
#3:    4  xyz      3
#4:    8  qwe      1

# for human-readable form, we can later convert bitmask to list of errors - as their numbers, or as their text:
# for this, I use a helper function converting bitmask from number to list of binary orders,
# for example: 
# 2L = '010', so int2bitsV(4) = 2 
# 4L = '100', so int2bitsV(4) = 3 
# 6L = '110', so int2bitsV(4) = 2, 3 
int2bitsV <- function(x) { which(as.logical(intToBits(x)), T);}

# here's how it works
dt1$errList <- sapply(dt1$errors, function(x){paste0("e", int2bitsV(x),  collapse = "; ");})
dt1$errText <- sapply(dt1$errors, function(x){errorList[int2bitsV(x)]})
dt1
#> dt1
#   colA colB errors errList     errText
#1:    2  foo      0       e            
#2:    5  bar      1      e1        A>3!
#3:    4  xyz      3  e1; e2 A>3!,B=xyz!
#4:    8  qwe      1      e1        A>3!

总的来说,这可以按照我的意愿运行,但我不喜欢我不得不混淆globalenv(),也不确定我如何使用位掩码 - 也许有更好的方法来存储标志?

0 个答案:

没有答案