当我有这些数据时(dat2)
CHOM POS REF ALT ALT1 ALT2 ...
1 121 A AA AT 0
2 254 GCGC GCGCG AGCG 0
3 214 C T 0 0
我需要在ALT,ALT1和ALT2 ..中标记每一行或每一个CEll,以及任何这些变异SNP,删除,插入。
这将解释如何区分SNP与删除和插入..
REF ALT1 ALT2
A T NA = SNP
AT T = deletion
CG CGG = insertion
ATT AT = deletion
也许输出会像这样
CHOM POS REF ALT ALT1 ALT2 ...
1 121 A deletion insertion 0
2 254 GCGC insertion SNP 0
答案 0 :(得分:2)
一种简单的方法是使用nchar
来查看每个字符串的长度。这假设数据已经过预先消毒
例如:
ref <- c("A", "AT", "CG", "ATT")
alt1 <- c("T", "T", "CGG", "AT")
ref.length <- nchar(ref)
alt1.length <- nchar(alt1)
variations <- ifelse(ref.length==alt1.length, "SNP",
ifelse(ref.length>alt1.length, "deletion",
"insertion"))
这给出了
> cbind(ref, alt1, variations)
ref alt1 variations
A "A" "T" "SNP"
AT "AT" "T" "deletion"
CG "CG" "CGG" "insertion"
ATT "ATT" "AT" "deletion"