之前我曾问过这个问题,但我发现我使用的R代码并不适用于所有情况:
例如:
第8行:添加了额外的逗号,我不想要这个。
第6-7行:在")之后如何处理多重条件,如x1,x2,x3,x1~2,x1~3;&#34 ;;
第3-5行:mat,pat或dn是该位置的三个条件;
基本上我想在mat或pat或dn之后添加一个逗号,如果没有" mat"," pat"或" dn",在x1或x2或x3或x1~2或x1~3之后添加逗号;但如果已经有逗号,我不想添加另一个逗号。我突出了我想要逗号的位置","在我想要的结果中。
> x <- c(
'arr 11p15.5(2097357-2432381)x311p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3',
'arr 11p15.5(2097357-2432381)x211p15.4(3224902-4383881)x1 pat',
'arr 11p15.5(2097357-2432381)x1 mat13q15.4(3224902-3483881)x1 pat',
'arr 11p15.5(2097357-2432381)x1 pat13q15.4(3224902-3483881)x1 pat',
'arr 11p15.5(2097357-2432381)x1 dn13q15.4(3224902-3483881)x1 pat',
'arr[hg19] Xp22.33p22.12(60701-21536551)x1~2 Xq21.31q28(90731177-155208244)x1 ish',
'arr[hg19] Xp22.33p22.12(60701-21536551)x1~3 Xq21.31q28(90731177-155208244)x1 ish',
'arr 11p15.5(2097357-2432381)x3,11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3)',
'nuc ish(D21S259/D21S341/D21S342x3).arr(21)x310q26.12(121812494-122486677)x1'
)
> sub(pattern = '([(]\\d+-\\d+[)]x[1-3|"1~3"|"1~2"](\\smat)?)', replacement = '\\1,', x=x)
[1] "arr 11p15.5(2097357-2432381)x3,11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3"
[2] "arr 11p15.5(2097357-2432381)x2,11p15.4(3224902-4383881)x1 pat"
[3] "arr 11p15.5(2097357-2432381)x1 mat,13q15.4(3224902-3483881)x1 pat"
[4] "arr 11p15.5(2097357-2432381)x1, pat13q15.4(3224902-3483881)x1 pat"
[5] "arr 11p15.5(2097357-2432381)x1, dn13q15.4(3224902-3483881)x1 pat"
[6] "arr[hg19] Xp22.33p22.12(60701-21536551)x1,~2 Xq21.31q28(90731177-155208244)x1 ish"
[7] "arr[hg19] Xp22.33p22.12(60701-21536551)x1,~3 Xq21.31q28(90731177-155208244)x1 ish"
[8] "arr 11p15.5(2097357-2432381)x3,,11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3)"
[9] "nuc ish(D21S259/D21S341/D21S342x3).arr(21)x310q26.12(121812494-122486677)x1,"
**Here are results I want to get:**
'arr 11p15.5(2097357-2432381)x3**,**11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3',
'arr 11p15.5(2097357-2432381)x2**,**11p15.4(3224902-4383881)x1 pat',
'arr 11p15.5(2097357-2432381)x1 mat**,**13q15.4(3224902-3483881)x1 pat',
'arr 11p15.5(2097357-2432381)x1 pat**,**13q15.4(3224902-3483881)x1 pat',
'arr 11p15.5(2097357-2432381)x1 dn**,**13q15.4(3224902-3483881)x1 pat',
'arr[hg19] Xp22.33p22.12(60701-21536551)x1~2**,** Xq21.31q28(90731177-155208244)x1 ish',
'arr[hg19] Xp22.33p22.12(60701-21536551)x1~3**,** Xq21.31q28(90731177-155208244)x1 ish',
'arr 11p15.5(2097357-2432381)x3**,**11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3)',
'nuc ish(D21S259/D21S341/D21S342x3).arr(21)x3**,**10q26.12(121812494-122486677)x1'
答案 0 :(得分:0)
这与问题中的描述相符,但与预期结果不符。希望您可以编辑它以满足您的需求。
add_comma = function(x) {
no_comma = !grepl(pattern = ",", x = x, )
# first try to add after mat, pat or dn
x[no_comma] = sub(pattern = "(\\smat|\\pat|\\sdn)", replacement = "\\1,", x = x[no_comma])
no_comma = !grepl(pattern = ",", x = x)
#next try after x1, x2, x3, not followed by tilde
x[no_comma] = sub(pattern = "(x[1-3])([^~])", replacement = "\\1,", x = x[no_comma])
no_comma = !grepl(pattern = ",", x = x)
#last try after x1~2 or x1~3
x[no_comma] = sub(pattern = "(x1~[2-3])", replacement = "\\1,", x = x[no_comma])
return(x)
}
add_comma(x)
# [1] "arr 11p15.5(2097357-2432381)x311p15.4(3424982-4083881)x3 pat,.nuc ish11p15.5(RP11-558K10x3"
# [2] "arr 11p15.5(2097357-2432381)x211p15.4(3224902-4383881)x1 pat,"
# [3] "arr 11p15.5(2097357-2432381)x1 mat,13q15.4(3224902-3483881)x1 pat"
# [4] "arr 11p15.5(2097357-2432381)x1 pat,13q15.4(3224902-3483881)x1 pat"
# [5] "arr 11p15.5(2097357-2432381)x1 dn,13q15.4(3224902-3483881)x1 pat"
# [6] "arr[hg19] Xp22.33p22.12(60701-21536551)x1~2 Xq21.31q28(90731177-155208244)x1,ish"
# [7] "arr[hg19] Xp22.33p22.12(60701-21536551)x1~3 Xq21.31q28(90731177-155208244)x1,ish"
# [8] "arr 11p15.5(2097357-2432381)x3,11p15.4(3424982-4083881)x3 pat.nuc ish11p15.5(RP11-558K10x3)"
# [9] "nuc ish(D21S259/D21S341/D21S342x3,.arr(21)x310q26.12(121812494-122486677)x1"
如您所见,它首先尝试在第一次出现mat
,pat
或dn
后添加逗号。如果失败,则会在x1
,x2
或x3
而非之后添加逗号后跟逗号。如果失败,则会在x1~2
或x1~3
之后添加逗号。