Question

这来自一个名为"VariantAnnotation"的R库及其依赖项"Biostrings"

我有一个DNAstringsSetList，我想将其转换为普通列表或字符串向量。

library(VariantAnnotation)

fl <- system.file("extdata", "chr22.vcf.gz", package="VariantAnnotation")

vcf <- readVcf(fl, "hg19")

tempo <- rowRanges(vcf)$ALT  # Here is the DNAstringsSetList I mean.

print(tempo)

A DNAStringSet instance of length 10376
    width seq
[1]     1 G
[2]     1 T
[3]     1 A
[4]     1 T
[5]     1 T
...   ... ...
[10372]     1 G
[10373]     1 G
[10374]     1 G
[10375]     1 A
[10376]     1 C

tempo[[1]]
A DNAStringSet instance of length 1
width seq
[1]     1 G

但是我不想要这种格式。我只想要基数的字符串，以便将它们作为一列插入新数据帧中。我想要这个：

G
T
A
T
T

我已经使用以下打包方法完成了此操作：

as.character(tempo@unlistData)

但是，它返回的速度比速度快10行！此结果的首尾和速度是完全相同的，因此中间的某个地方应该有10条额外的行不应该形成（不是NA）

Answer 1

您可以在as.character或DNAString上致电DNAStringSet。

as.character(tempo[1 : 5])
# [1] "G" "T" "A" "T" "T"

Answer 2

一个简单的循环使用相同库的toString函数解决了这个问题：

ALT <-0
for (i in 1:nrow(vcf)){ ALT[i] <- toString(tempo[[i]]) }

但是，我不知道为什么tempo @ unlistData检索太多行。这不值得信赖。

将DNAstringsSet解构为普通字符串

2 个答案: