使用第二个df作为拆分因子在列表中拆分列表(列表为DNAStringSet)

时间:2019-01-23 17:51:01

标签: r split lapply

我想使用另一个DNAStringSets来拆分list的列表。我找到了这个问题R: split elements of a list into sublists,并为我的问题建立了一个例子:

library("DECIPHER")
library("Biostrings")     
aDNAStringSet <- DNAStringSet(c("GCAATCCATTAC", "AAATCGCCATCC", "GCATACCTTAAC", "GCATACCATTAC", "AGCATACCTTAC", "AGCATACCTTAC", "AGCATACCTTAA", "AGCATACCTAAC","GCAATCCATTAC", "AAATCGCCATCC", "GCATACCTTAAC", "GCATACCATTAC", "AGCATACCTTAC", "AGCATACCTTAC", "AGCATACCTTAA", "AGCATACCTAAC"))

names(aDNAStringSet) <- c("seq1", "seq2", "seq3", "seq4", "seq5", "seq6", "seq7", "seq8", "seq9", "seq10", "seq11", "seq12", "seq13", "seq14", "seq15", "seq16") 

aDNAStringSet

我们制作了一个DNAStringSet,其名称锁定如下:

> aDNAStringSet
  A DNAStringSet instance of length 16
     width seq                                                                                            names               
 [1]    12 GCAATCCATTAC                                                                                   seq1
 [2]    12 AAATCGCCATCC                                                                                   seq2
 [3]    12 GCATACCTTAAC                                                                                   seq3
 [4]    12 GCATACCATTAC                                                                                   seq4
 [5]    12 AGCATACCTTAC                                                                                   seq5
 ...   ... ...
[12]    12 GCATACCATTAC                                                                                   seq12
[13]    12 AGCATACCTTAC                                                                                   seq13
[14]    12 AGCATACCTTAC                                                                                   seq14
[15]    12 AGCATACCTTAA                                                                                   seq15
[16]    12 AGCATACCTAAC                                                                                   seq16

现在,我将随机分组以将它们分为:

group <- c(rep(1,3), rep(2,5), rep(3,4), rep(4,4))
names <- c("seq1", "seq2", "seq3", "seq4", "seq5", "seq6", "seq7", "seq8", "seq9", "seq10", "seq11", "seq12", "seq13", "seq14", "seq15", "seq16")

sort <- data.frame(cbind(group, names))

并按组划分:

bygroup <- split(aDNAStringSet, f = sort$group)

看起来像:

> bygroup
DNAStringSetList of length 4
[["1"]] seq1=GCAATCCATTAC seq2=AAATCGCCATCC seq3=GCATACCTTAAC
[["2"]] seq4=GCATACCATTAC seq5=AGCATACCTTAC seq6=AGCATACCTTAC seq7=AGCATACCTTAA seq8=AGCATACCTAAC
[["3"]] seq9=GCAATCCATTAC seq10=AAATCGCCATCC seq11=GCATACCTTAAC seq12=GCATACCATTAC
[["4"]] seq13=AGCATACCTTAC seq14=AGCATACCTTAC seq15=AGCATACCTTAA seq16=AGCATACCTAAC

现在我再次Adjust序列:

Adjusted <- lapply(bygroup, FUN=AdjustAlignment,processors = NULL)

看起来像:

> Adjusted
$`1`
  A DNAStringSet instance of length 3
    width seq                                                                                             names               
[1]    12 GCAATCCATTAC                                                                                    seq1
[2]    12 AAATCGCCATCC                                                                                    seq2
[3]    12 GCATACCTTAAC                                                                                    seq3

$`2`
  A DNAStringSet instance of length 5
    width seq                                                                                             names               
[1]    12 GCATACCATTAC                                                                                    seq4
[2]    12 AGCATACCTTAC                                                                                    seq5
[3]    12 AGCATACCTTAC                                                                                    seq6
[4]    12 AGCATACCTTAA                                                                                    seq7
[5]    12 AGCATACCTAAC                                                                                    seq8

$`3`
  A DNAStringSet instance of length 4
    width seq                                                                                             names               
[1]    12 GCAATCCATTAC                                                                                    seq9
[2]    12 AAATCGCCATCC                                                                                    seq10
[3]    12 GCATACCTTAAC                                                                                    seq11
[4]    12 GCATACCATTAC                                                                                    seq12

$`4`
  A DNAStringSet instance of length 4
    width seq                                                                                             names               
[1]    12 AGCATACCTTAC                                                                                    seq13
[2]    12 AGCATACCTTAC                                                                                    seq14
[3]    12 AGCATACCTTAA                                                                                    seq15
[4]    12 AGCATACCTAAC                                                                                    seq16

后面紧跟DistanceMatrixIdClusters定义新的群集以进行进一步拆分。

D <- lapply(Adjusted, FUN=DistanceMatrix,processors = NULL)
Clust <- lapply(D, FUN=IdClusters, method="NJ",cutoff=c(0.15), showPlot=TRUE, type="clusters")

Clust如下:

> Clust
$`1`
     cluster
seq1       2
seq2       1
seq3       3

$`2`
     cluster
seq4       1
seq5       3
seq6       3
seq7       3
seq8       2

$`3`
      cluster
seq9        3
seq10       1
seq11       2
seq12       4

$`4`
      cluster
seq13       1
seq14       2
seq15       1
seq16       2

现在我想使用AdjustedClust根据lapply来拆分split列表

byClust <- lapply(Adjusted,FUN=split, Clust$cluster)

但是我得到了错误:

> byClust <- lapply(Adjusted,FUN=split, Clust$cluster)
Error in normSplitFactor(f, x) : 
  split factor has length 0 but 'NROW(x)' is > 0

两个列表的长度相同。可能是什么问题呢?有什么主意吗?

0 个答案:

没有答案