向量元素中字符数的比较

时间:2014-03-30 20:51:20

标签: r compare

当我有这些数据时(dat2)

CHOM POS REF       ALT        ALT1    ALT2 ...
1    121  A        AA         AT        0
2    254  GCGC    GCGCG      AGCG       0
3    214   C        T         0         0

我需要在ALT,ALT1和ALT2 ..中标记每一行或每一个CEll,以及任何这些变异SNP,删除,插入。

这将解释如何区分SNP与删除和插入..

REF   ALT1 ALT2
A     T     NA   = SNP
AT    T          = deletion
CG    CGG        = insertion
ATT   AT         = deletion

也许输出会像这样

    CHOM  POS REF       ALT                ALT1          ALT2 ...
     1    121  A        deletion         insertion        0
     2    254  GCGC     insertion           SNP           0

1 个答案:

答案 0 :(得分:2)

一种简单的方法是使用nchar来查看每个字符串的长度。这假设数据已经过预先消毒

例如:

ref <- c("A", "AT", "CG", "ATT")
alt1 <- c("T", "T", "CGG", "AT")

ref.length <- nchar(ref)
alt1.length <- nchar(alt1)

variations <- ifelse(ref.length==alt1.length, "SNP",
                     ifelse(ref.length>alt1.length, "deletion", 
                            "insertion"))

这给出了

> cbind(ref, alt1, variations)
    ref   alt1  variations 
A   "A"   "T"   "SNP"      
AT  "AT"  "T"   "deletion" 
CG  "CG"  "CGG" "insertion"
ATT "ATT" "AT"  "deletion"