如何在字符模式之间添加逗号?

时间:2018-02-27 03:30:42

标签: r

我正在使用R来处理data.frame;一列有一定的字母和数字混合,我想在一个字符模式之间加一个逗号:

输入:

 arr 11p15.5(2097357-2432381)x311p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3
 arr 11p15.5(2097357-2432381)x211p15.4(3224902-4383881)x1 pat
 arr 11p15.5(2097357-2432381)x1 mat13q15.4(3224902-3483881)x1 pat

期望的输出:

 arr 11p15.5(2097357-2432381)x3,11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3
 arr 11p15.5(2097357-2432381)x2,11p15.4(3224902-4383881)x1 pat
 arr 11p15.5(2097357-2432381)x1 mat,13q15.4(3224902-3483881)x1 pat

基本上,我想在第一个(xxx-xxx)x1之后加一个逗号(这里可能是x1,x2,x3,然后可能有一个" mat"," pat&#34 ; x1之后)。

非常感谢MichaelChirico和Onyambu,我从该专栏中提取了更多内容,

输入' arr 11p15.5(2097357-2432381)x311p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3',' arr 11p15.5(2097357) -2432381)x211p15.4(3224902-4383881)x1 pat',' arr 11p15.5(2097357-2432381)x1 mat13q15.4(3224902-3483881)x1 pat',' arr [hg19] Xp22.33p22.12(60701-21536551)x1~3 Xq21.31q28(90731177-155208244)x1 ish',' arr 11p15.5(2097357-2432381)x3,11p15.4(3424982 -4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3)',' nuc ish(D21S259 / D21S341 / D21S342x3).arr(21)x310q26.12(121812494-122486677)x1'

输出' arr 11p15.5(2097357-2432381)x3,11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3',' arr 11p15.5 (2097357-2432381)x2,11p15.4(3224902-4383881)x1 pat',' arr 11p15.5(2097357-2432381)x1 mat,13q15.4(3224902-3483881)x1 pat' ,' arr [hg19] Xp22.33p22.12(60701-21536551)x1~3,Xq21.31q28(90731177-155208244)x1 ish',' arr 11p15.5(2097357-2432381) x3,11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3)',' nuc ish(D21S259 / D21S341 / D21S342x3).arr(21)x3,10q26.12 (121812494-122486677)X1'

我正在尝试使用以下代码,但适用于所有情况,

x < - c(&#39; arr 11p15.5(2097357-2432381)x311p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3&#39;,&#39; arr 11p15.5(2097357-2432381)x211p15.4(3224902-4383881)x1 pat&#39;,&#39; arr 11p15.5(2097357-2432381)x1 mat13q15.4(3224902-3483881)x1 pat&#39;, &#39; arr [hg19] Xp22.33p22.12(60701-21536551)x1~3 Xq21.31q28(90731177-155208244)x1 ish&#39;,&#39; arr 11p15.5(2097357-2432381)x3, 11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3)&#39;,&#39; nuc ish(D21S259 / D21S341 / D21S342x3).arr(21)x310q26.12(121812494-122486677) )x1&#39;)sub(pattern =&#39;([)] x [1 | 2 | 3 | 1~2 | 1~3] \ s [mat | pat | dn]?))&#39; ,replacement =&#39; \ 1,&#39;,x = x)

1 个答案:

答案 0 :(得分:0)

可以执行以下操作

x <- c(
    'arr 11p15.5(2097357-2432381)x311p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3',
    'arr 11p15.5(2097357-2432381)x211p15.4(3224902-4383881)x1 pat',
    'arr 11p15.5(2097357-2432381)x1 mat13q15.4(3224902-3483881)x1 pat'
)
sub(pattern = "([(][0-9]+-[0-9]+[)]x[0-9])([^[:space:]].*)", replacement = "\\1,\\2", x = x)

以下是一个简短的解释:

1)匹配项(xxx-xxx)x1的正则表达式为[(][0-9]+-[0-9]+[)]x[0-9],此处我使用[]而不是转义匹配(。休息可以被读作数字[0-9]+后跟-,后跟数字[0-9]+后跟)x和数字[0-9]

2)稍后使用捕获组拆分字符串和concat,我们将字符串分割为非空白字符,后跟任意数量的字符([^[:space:]].*),以便1中的模式位于第一组,其余位于第二组。并且连接2个组,添加,,例如"\\1,\\2"