测试多列字符串以在R中按行匹配

时间:2019-06-19 10:14:21

标签: r string match

作为属性一致性分析的一部分,我需要找出操作符(x,y,z)在多少情况下彼此完全一致。假设我的数据集看起来像这样。

library(data.table)
DT <- data.table(x = c("Good","Average","Bad"), y = c("Good","Average","Bad"), z = c("Average","Average","Bad"))

> DT
     x       y       z
1:    Good    Good Average
2: Average Average Average
3:    Poor    Poor    Poor
4:    Poor Average    Good 

对于每一行,我想知道x,y和z列中的字符串是否相等。 并将结果打印在新列中。 如果所有列均相等,则应返回一。 如果一列或多列具有不同的值,则应返回零。

     x       y       z     all.equal
1:    Good    Good Average         0
2: Average Average Average         1
3:    Poor    Poor    Poor         1
4:    Poor Average    Good         0

我已经成功地检查了两列是否相等

vgrepl <- Vectorize(grepl)
DT[, all.equal:= as.integer(vgrepl(x, y))]

但是我不能让它用于两列以上。

非常感谢您!

2 个答案:

答案 0 :(得分:0)

此方法检查每行中是否有1个或多个唯一值:

library(data.table)

DT <- data.table(x = c("Good","Average","Bad"), 
                 y = c("Good","Average","Bad"), 
                 z = c("Average","Average","Bad"))

DT[, all.equal:= as.numeric(length(unique(c(x,y,z))) == 1), by=seq_len(nrow(DT))]

DT

#          x       y       z all.equal
# 1:    Good    Good Average         0
# 2: Average Average Average         1
# 3:     Bad     Bad     Bad         1

答案 1 :(得分:0)

cols <- c("x", "y", "z")
all_same <- function(x) as.integer(all(x[1] == x[-1])) 
DT[, all.equal := apply(.SD, 1, all_same), .SDcols = cols]


#          x       y       z all.equal
# 1:    Good    Good Average         0
# 2: Average Average Average         1
# 3:     Bad     Bad     Bad         1