我有一个xstringset对象
A DNAStringSet instance of length 151674
width seq names
[1] 253 GAACAGCATGAATGTTAAAACTGAAATGGATG...TGATGGTTAGGTTTTCAGAAAAAGCAGAAGA LGKD01000001.1 Oc...
[2] 150158 TATATATATATAGTCAATTCGAGGATGTTAGA...TCCGGATACTATTCCAGAGTTTCCTTGCAAA KQ415657.1 Octopu...
[3] 619 ATAGACATACACACAAATATTTTTATATCACA...TATATACATATTTATACATATATATATATAT LGKD01000030.1 Oc...
[4] 359 TCACCAGTGGCAGCCGCGGCTACAGCAAAAGG...CACGGGCTGTACAACGACCCTGATGACTCCG LGKD01000031.1 Oc...
[5] 239 GAAGTGGTAAAGAGTGCGATGCGCTGAAAAAA...CTCTTTTTTCAGCGCATCGCACTCTTTACCA LGKD01000032.1 Oc...
... ... ...
[151670] 2021 AAAACCTAAACATGTTAAATCAGAGATTGCAA...ATATATAAGTATATATATATATATATATATA KQ434080.1 Octopu...
[151671] 420 CCCCACCTCCACTATCAACACCACTACCACCA...GAAGAAGAAGAAGAAGAAGAAGAAGAAGAAG LGKD01700121.1 Oc...
[151672] 424 ACACACACACACACACACACACATATACATAT...GTAAATGTGTCCGTGTGTAGTAAGCATGTGT LGKD01700122.1 Oc...
[151673] 242 ATATATATATATATATATACATCAACATATAT...ATATGTAGACGTGTGTGTATATATATATATA LGKD01700123.1 Oc...
[151674] 214 CACACACACACACACACACACACACACACACA...ACTCATATGTACAACACACATTTATACGCTT LGKD01700124.1 Oc...
>
我以降序对其进行了排序,从而获得了这一点:
> sort_oc=sort(width(oc), decreasing = TRUE)
> sort_oc[1:10]
[1] 4064693 3315273 3181678 3174068 2987449 2908116 2784626 2705535 2686354 2631168
如何获取通过排序获得的每个宽度的对应字符串?
例如,我期望这样的结果:
width seq names
[567] 4064693 GAACAGCATGAATGTTAAAACTGAAATGGATG...TGATGGTTAGGTTTTCAGAAAAAGCAGAAGA LGKD01000001.1 Oc...
[350] 3315273 AAAACCTAAACATGTTAAATCAGAGATTGCAA...ATATATAAGTATATATATATATATATATATA KQ434080.1 Octopu...
以此类推
答案 0 :(得分:2)
Andrew's的答案非常接近,但是由于DNAStringSet
不是data.frame,因此需要使用Biostrings::width
函数(而不是常规子集)来获取宽度:< / p>
oc[order(width(oc), decreasing = T),]
这将返回相同的DNAStringSet
对象,该对象按宽度降序排列