替换data.frame R中每个元素中的非匹配子字符串

时间:2016-02-25 16:32:28

标签: r

看起来像一个简单的问题,但它让我困扰了几个小时,而且侦探一直没有成功。

我有一个大型数据框,其元素包含字符“ABCD”等。

如果第1和第3个子字符串不匹配,我想用NA替换元素:

“DAD”“MOM”“BABOON”“SISTER”元素将保持不变(因为第一和第三个子串匹配,但“CAT”“STEP”“JULIAN”将设置为NA。每个元素的长度为动态,但它始终是我感兴趣的第一和第三子串。

Work\app\cache\dev\classes.php line 4990
Context: { "exception": "Object(Twig_Error_Runtime)" }

在其他尝试中,我觉得这是我最接近的:

> dput(d)
structure(list(v1 = structure(c(6L, 2L, 1L, 3L, 4L, 5L), .Label = c("BABOON", 
"BOB", "BOO", "CAR", "CAT", "JULIAN"), class = "factor"), v2 = structure(c(4L, 
1L, 3L, 6L, 5L, 2L), .Label = c("GREEN", "GROW", "LINDA", "MOM", 
"SKY", "TOP"), class = "factor"), v3 = structure(c(3L, 1L, 5L, 
4L, 2L, 6L), .Label = c("DAD", "GAG", "LOGAN", "LOOK", "SISTER", 
"STAR"), class = "factor")), .Names = c("v1", "v2", "v3"), class = "data.frame", row.names = c(NA, 
-6L))

d_with_NAs应如下所示:

d_with_NAs=d[apply(d,1,function(y) if(substring(d[y],1,1) != substring(d[y],3,3)){y=NA}),]

3 个答案:

答案 0 :(得分:1)

试试这个:

aQueue

修改

x <- c("DAD", "MOM", "BABOON", "SISTER", "CAT", "STEP", "JULIAN") ind <- substr(x, 1, 1) != substr(x, 3, 3) x[ind] <- NA x #[1] "DAD" "MOM" "BABOON" "SISTER" NA NA NA

的上下文中
data.frame

甚至更简洁,没有类型转换:

as.data.frame(apply(dat, 2, FUN = function(x){
 tmp <- rep(NA, length(x))
 ind <- substr(x, 1, 1) == substr(x, 3, 3)
 tmp[ind] <- x[ind]
 tmp
   })
)

#      v1   v2     v3
#1   <NA>  MOM   <NA>
#2    BOB <NA>    DAD
#3 BABOON <NA> SISTER
#4   <NA> <NA>   <NA>
#5   <NA> <NA>    GAG
#6   <NA> <NA>   <NA>

答案 1 :(得分:1)

只需将stas g的解决方案应用于data.frame的行或列:

x <- c("DAD", "MOM", "BABOON", "SISTER", "CAT", "STEP", "JULIAN")
y <- c("BOB", "TITLES", "CACAO", "PREGNANT", "FLIP", "TRINIAN", "COILSPRING")
df <- data.frame(x = x, y = y)
newdf = apply (df, 2, function(x){
   # this bit exactly what stas g said
   ind <- substr(x, 1, 1) != substr(x, 3, 3)
   x[ind] <- NA
   return(x)
})
newdf

答案 2 :(得分:1)

如果您未与data.frame对象结婚,则可以使用matrix个对象和substr完成此操作。

mat <- as.matrix(df)
idx <- which(substr(mat, 1, 1) != substr(mat, 3, 3))
mat[idx] <- NA
mat
     v1       v2    v3      
[1,] NA       "MOM" NA      
[2,] "BOB"    NA    "DAD"   
[3,] "BABOON" NA    "SISTER"
[4,] NA       NA    NA      
[5,] NA       NA    "GAG"   
[6,] NA       NA    NA  

如果您愿意,可以将其转换回data.frame