将逗号添加到跟随R中某些模式的位置

时间:2018-03-13 00:04:20

标签: r

之前我曾问过这个问题,但我发现我使用的R代码并不适用于所有情况:

例如:
    第8行:添加了额外的逗号,我不想要这个。
    第6-7行:在")之后如何处理多重条件,如x1,x2,x3,x1~2,x1~3;&#34 ;;
    第3-5行:mat,pat或dn是该位置的三个条件;

基本上我想在mat或pat或dn之后添加一个逗号,如果没有" mat"," pat"或" dn",在x1或x2或x3或x1~2或x1~3之后添加逗号;但如果已经有逗号,我不想添加另一个逗号。我突出了我想要逗号的位置","在我想要的结果中。

> x <- c(
   'arr 11p15.5(2097357-2432381)x311p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3',
   'arr 11p15.5(2097357-2432381)x211p15.4(3224902-4383881)x1 pat',
   'arr 11p15.5(2097357-2432381)x1 mat13q15.4(3224902-3483881)x1 pat',
   'arr 11p15.5(2097357-2432381)x1 pat13q15.4(3224902-3483881)x1 pat',
   'arr 11p15.5(2097357-2432381)x1 dn13q15.4(3224902-3483881)x1 pat',
   'arr[hg19] Xp22.33p22.12(60701-21536551)x1~2 Xq21.31q28(90731177-155208244)x1 ish', 
   'arr[hg19] Xp22.33p22.12(60701-21536551)x1~3 Xq21.31q28(90731177-155208244)x1 ish',
   'arr 11p15.5(2097357-2432381)x3,11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3)',
   'nuc ish(D21S259/D21S341/D21S342x3).arr(21)x310q26.12(121812494-122486677)x1'
 )

> sub(pattern = '([(]\\d+-\\d+[)]x[1-3|"1~3"|"1~2"](\\smat)?)', replacement = '\\1,', x=x)
[1] "arr 11p15.5(2097357-2432381)x3,11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3"  
[2] "arr 11p15.5(2097357-2432381)x2,11p15.4(3224902-4383881)x1 pat"                               
[3] "arr 11p15.5(2097357-2432381)x1 mat,13q15.4(3224902-3483881)x1 pat"                           
[4] "arr 11p15.5(2097357-2432381)x1, pat13q15.4(3224902-3483881)x1 pat"                           
[5] "arr 11p15.5(2097357-2432381)x1, dn13q15.4(3224902-3483881)x1 pat"                            
[6] "arr[hg19] Xp22.33p22.12(60701-21536551)x1,~2 Xq21.31q28(90731177-155208244)x1 ish"           
[7] "arr[hg19] Xp22.33p22.12(60701-21536551)x1,~3 Xq21.31q28(90731177-155208244)x1 ish"           
[8] "arr 11p15.5(2097357-2432381)x3,,11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3)"
[9] "nuc ish(D21S259/D21S341/D21S342x3).arr(21)x310q26.12(121812494-122486677)x1,"    

**Here are results I want to get:** 
  'arr 11p15.5(2097357-2432381)x3**,**11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3',
  'arr 11p15.5(2097357-2432381)x2**,**11p15.4(3224902-4383881)x1 pat',
  'arr 11p15.5(2097357-2432381)x1 mat**,**13q15.4(3224902-3483881)x1 pat',
  'arr 11p15.5(2097357-2432381)x1 pat**,**13q15.4(3224902-3483881)x1 pat',
  'arr 11p15.5(2097357-2432381)x1 dn**,**13q15.4(3224902-3483881)x1 pat',
  'arr[hg19] Xp22.33p22.12(60701-21536551)x1~2**,** Xq21.31q28(90731177-155208244)x1 ish', 
  'arr[hg19] Xp22.33p22.12(60701-21536551)x1~3**,** Xq21.31q28(90731177-155208244)x1 ish',
  'arr 11p15.5(2097357-2432381)x3**,**11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3)',
  'nuc ish(D21S259/D21S341/D21S342x3).arr(21)x3**,**10q26.12(121812494-122486677)x1'

1 个答案:

答案 0 :(得分:0)

这与问题中的描述相符,但与预期结果不符。希望您可以编辑它以满足您的需求。

add_comma = function(x) {
    no_comma = !grepl(pattern = ",", x = x, )

    # first try to add after mat, pat or dn
    x[no_comma] = sub(pattern = "(\\smat|\\pat|\\sdn)", replacement = "\\1,", x = x[no_comma])
    no_comma = !grepl(pattern = ",", x = x)

    #next try after x1, x2, x3, not followed by tilde
    x[no_comma] = sub(pattern = "(x[1-3])([^~])", replacement = "\\1,", x = x[no_comma])
    no_comma = !grepl(pattern = ",", x = x)

    #last try after x1~2 or x1~3
    x[no_comma] = sub(pattern = "(x1~[2-3])", replacement = "\\1,", x = x[no_comma])

    return(x)
}

add_comma(x)
# [1] "arr 11p15.5(2097357-2432381)x311p15.4(3424982-4083881)x3 pat,.nuc ish11p15.5(RP11-558K10x3" 
# [2] "arr 11p15.5(2097357-2432381)x211p15.4(3224902-4383881)x1 pat,"                              
# [3] "arr 11p15.5(2097357-2432381)x1 mat,13q15.4(3224902-3483881)x1 pat"                          
# [4] "arr 11p15.5(2097357-2432381)x1 pat,13q15.4(3224902-3483881)x1 pat"                          
# [5] "arr 11p15.5(2097357-2432381)x1 dn,13q15.4(3224902-3483881)x1 pat"                           
# [6] "arr[hg19] Xp22.33p22.12(60701-21536551)x1~2 Xq21.31q28(90731177-155208244)x1,ish"           
# [7] "arr[hg19] Xp22.33p22.12(60701-21536551)x1~3 Xq21.31q28(90731177-155208244)x1,ish"           
# [8] "arr 11p15.5(2097357-2432381)x3,11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3)"
# [9] "nuc ish(D21S259/D21S341/D21S342x3,.arr(21)x310q26.12(121812494-122486677)x1" 

如您所见,它首先尝试在第一次出现matpatdn后添加逗号。如果失败,则会在x1x2x3 而非之后添加逗号后跟逗号。如果失败,则会在x1~2x1~3之后添加逗号。