删除每行中的额外字符?

时间:2014-03-04 16:39:39

标签: r data-manipulation

我有一个变量,由于某种原因,R增加了额外的" X"在每个的开头。这是我可以避免的常见现象吗?

无论如何,下面是我的数据(目前变量存储在列表中):

X1
X5
X33
X37
...

> str(rc1_output)
 chr [1:63, 1:3] "X1" "X5" "X33" "X37" "X52" "X645" "X646" ...
 - attr(*, "dimnames")=List of 2
  ..$ : chr [1:63] "X1" "X5" "X33" "X37" ...
  ..$ : chr [1:3] "" "Entropy" "Subseq."

> dput(head(rc1_output))
structure(c("X1", "X5", "X33", "X37", "X52", "X645", "0", "0", 
"0", "0", "0", "0", "0.256010845762264", "0.071412419435563", 
"0.071412419435563", "0.071412419435563", "0.071412419435563", 
"0.071412419435563"), .Dim = c(6L, 3L), .Dimnames = list(c("X1", 
"X5", "X33", "X37", "X52", "X645"), c("", "Entropy", "Subseq."
)))

如何循环遍历变量的所有行并删除X

1 个答案:

答案 0 :(得分:2)

尝试substrgsub

x <- c("X1", "X354", "X234", "X2134")
substr(x, 2, nchar(x))
# [1] "1"    "354"  "234"  "2134"
gsub("^X", "", x)
# [1] "1"    "354"  "234"  "2134"

更新

它看起来只是第一列(未命名),rownames受到影响。同样的一般方法适用:

> rc1_output[, 1] <- gsub("^X", "", rc1_output[, 1])
> rc1_output
           Entropy Subseq.            
X1   "1"   "0"     "0.256010845762264"
X5   "5"   "0"     "0.071412419435563"
X33  "33"  "0"     "0.071412419435563"
X37  "37"  "0"     "0.071412419435563"
X52  "52"  "0"     "0.071412419435563"
X645 "645" "0"     "0.071412419435563"

如果需要,请重复rownames(rc1_output)的过程,如下所示:

rownames(rc1_output) <- gsub("^X", "", rownames(rc1_output))

然而,我的猜测是,你可以在代码的早期阶段更有效地解决这个问题。如果我们首先知道这些数据是如何形成这种形式的,那么这将使诊断变得更加容易。