在我的数据表中,我希望能够为每条记录添加某种“标志”(错误消息),并将其保存在一列中(不是每个标志的单独列)。我实施了一些解决方案,但我不喜欢它,并且会对如何更好地完成这一任务的任何建议表示感谢。我决定将我的标志转换为位掩码并在我的表中存储一个整数。因此,我必须单独列出数字和错误文本之间的对应关系。为了添加标志,我创建了一个函数,它接受条件(即哪些记录应该被标记),错误文本和包含现有错误的变量名称。
# sample table:
dt1 <- data.table(colA=c(2,5,4,8),
colB=c('foo', 'bar', 'xyz', 'qwe') )
dt1
#> dt1
# colA colB
#1: 2 foo
#2: 5 bar
#3: 4 xyz
#4: 8 qwe
# main function which adds a flag:
# dtIn - data.table containing our records
# condition - expression defines which records will be flagged
# flagtxt - text of error message
# inpArrName- name of the variable which contains existing flags list
adderrb <- function(dtIn, condition, flagtxt, inpArrName="flagsList"){
wEnv <- globalenv(); # environment needed to modify variable with flags list
if (exists(inpArrName, envir=wEnv )){ # check if variable with flags list is defined
inpArr <- get(inpArrName, envir = wEnv); # then use it
} else {inpArr <- array(dim=0);} # otherwise create it as an empty array
newval <- match(flagtxt, inpArr); # trying to find requested flag text in current list, to avoid duplication
if (length(newval)==0 || is.na(newval)) { # if this flag text isn't in list
newval <- length(inpArr)+1; # we will add it as new member
inpArr[newval] <- flagtxt;
assign(inpArrName, inpArr, envir=wEnv); # modify variable in parent (global) variable
}
# adding flags to the 'errors' column of our table
eval.parent(substitute({ #
if (!is.element('errors', names(dtIn))) {dtIn[,errors:=0L];} # if this column did not exist, create it, initialize with 0L
dtIn[condition, errors:=bitwOr(errors, 2^(newval-1))]; # for records which meet condition, we add new flad to existing mask
invisible(dtIn);
})); #
}
# example of usage:
adderrb(dt1, colA>3, "A>3!", "errorList")
adderrb(dt1, colB=="xyz", "B=xyz!", "errorList")
dt1
#> dt1
# colA colB errors
#1: 2 foo 0
#2: 5 bar 1
#3: 4 xyz 3
#4: 8 qwe 1
# for human-readable form, we can later convert bitmask to list of errors - as their numbers, or as their text:
# for this, I use a helper function converting bitmask from number to list of binary orders,
# for example:
# 2L = '010', so int2bitsV(4) = 2
# 4L = '100', so int2bitsV(4) = 3
# 6L = '110', so int2bitsV(4) = 2, 3
int2bitsV <- function(x) { which(as.logical(intToBits(x)), T);}
# here's how it works
dt1$errList <- sapply(dt1$errors, function(x){paste0("e", int2bitsV(x), collapse = "; ");})
dt1$errText <- sapply(dt1$errors, function(x){errorList[int2bitsV(x)]})
dt1
#> dt1
# colA colB errors errList errText
#1: 2 foo 0 e
#2: 5 bar 1 e1 A>3!
#3: 4 xyz 3 e1; e2 A>3!,B=xyz!
#4: 8 qwe 1 e1 A>3!
总的来说,这可以按照我的意愿运行,但我不喜欢我不得不混淆globalenv()
,也不确定我如何使用位掩码 - 也许有更好的方法来存储标志?