Question

我想重新格式化一些基因组变化，以便可以使用某种工具。如何在同一字符串中的冒号后面移动字符串的前两个字符？

例如： g.chr17：7577121G> A必须变为chr17：g.7577121G> A
g.chr3：52712586T> C必须变为chr3：g.52712586T> C

使用gsub粘贴可能有一种非常简单的方法，但我无法弄清楚。

Answer 1

尝试此选项：

input <- "g.chr17:7577121G>A"
input <- sub("^([^.]+\\.)([^:]+:)", "\\2\\1", input)
input

[1] "chr17:g.7577121G>A"

该模式可能需要一些解释：

^                from the beginning of the input
    ([^.]+\\.)   match and capture any non dot characters up to and including
                 the first dot
    ([^:]+:)     then match and capture any non colon characters up to and
                 including the first colon

然后，我们将这两个捕获的组取而代之。在这种情况下，第一组为g.，第二组为chr17:。因此，替换字符串将以chr17:g.开头，然后是已经存在的所有内容。

Answer 2

我们可以将sub与3个捕获组一起使用

sub("(^.{2})(.*:)(.*)", "\\2\\1\\3", x)
#[1] "chr17:g.7577121G>A" "chr3:g.52712586T>C"

^.{2}-第一个捕获组是前两个字符。

.*:-第二个捕获组是直到冒号的字符串。

.*-第三个捕获组是剩余的字符串。

现在我们按2-1-3的顺序排列这些组。

数据

x <- c("g.chr17:7577121G>A", "g.chr3:52712586T>C")

Answer 3

这里是没有正则表达式的

v1 <- strsplit(input, "[.:]")[[1]]
paste0(v1[2], ":", v1[1], ".", v1[3])
#[1] "chr17:g.7577121G>A"

数据

input <- "g.chr17:7577121G>A"

将字符串的前两个字符移到字符串中的特定字符之后

3 个答案:

数据