Question

我的数据框有两列，V1和V2，两列都有A1，A2，A1 + A2，A3等条目。

如果任一列包含另一列的子字符串，我想删除行。所以，例如，我想删除这样的行：

A1, A1+A2

A1+A2,A1

但不是这样的行：

A1+A2, A3

我目前正在使用此代码：

subset(dat, !dat$V1 %in% dat$V2)

但是当我想保留那些行时，这段代码摆脱了像A1 / B1，A2-B2和A 02，A4这样的行。

我在想我可以使用charmatch，也许是这样：

subset(dat, charmatch(dat$V1, dat$V2) == "NA")

但这会返回一个空数据帧。

当我运行此代码以检查charmatch将摆脱的内容时：

trial <- subset(dat, charmatch(dat$V1, dat$V2) != "NA")

当我想要保留这些行时，会出现

A1 / B1，A2-B2和A 02，A4等行。

我认为问题可能在于A 02有空格，但我不知道如何解决这个问题。

我还考虑过使用grep / grepl和正则表达式，但是当我在另一列上搜索一个列的表达式时，我不确定它在语法上会是什么样子。我会将第一列转换为矢量并使用：

subset(dat, !grepl(V1vector, dat$V2))

有什么想法吗？

以下是一些数据集：

V1          V2
A3-B3   B3  
A4/B4   A3-B3   
A 28    A 05    
A 28    A 06    
A2-B2   A2  
B 05    B1

这就是我希望它的样子：

V1         V2
A4/B4      A3-B3
A 28       A 05
A 28       A 06
B 05       B1

Answer 1

试试这个：

df[!mapply(grepl, df$V2, df$V1),]

Answer 2

最小数据集：

f <- structure(list(V1 = c("A3-B3", "A4/B4", "A 28", "A 28", "A2-B2", 
"B 05"), V2 = c("B3", "A3-B3", "A 05", "A 06", "A2", "B1")), .Names = c("V1", 
"V2"), row.names = c(NA, -6L), class = "data.frame")

##entries of V1 that contain V2
mapply(grepl, f$V2, f$V1, MoreArgs=list(fixed=TRUE)) 
##entries of V2 that contain V1
mapply(grepl, f$V1, f$V2, MoreArgs=list(fixed=TRUE))

##combine the two negations
f[!mapply(grepl, f$V2, f$V1, MoreArgs=list(fixed=TRUE)) & 
  !mapply(grepl, f$V1, f$V2, MoreArgs=list(fixed=TRUE)),]

如果数据框的一个列条目是另一个列条目的子字符串，则删除行

2 个答案: