如何从R中的2列数据框中制作矢量列表?

时间:2019-03-07 14:27:09

标签: r dataframe vector split

我很难尝试将数据帧转换为向量列表,因为它的名称是数据帧第一列中的字符串,而向量本身已经是数据第二列中的向量框架。

我有这个数据框:

> head(df)
     Intron.ID
1 AT1G79930.2 
2 ATCG00720.1 
3 AT1G02080.2 
4 AT4G32551.2 
5 AT5G66190.1 
6 AT1G51720.1 
                                                                                                    Sequence.s.
1                                                        ['GAGGTGCTTGCAAATCGTTCACATCACTGTACTGCACATCAACAGAGAAT']
2  ['GCTTCTTTGTATTTTATGTTTTTAGTCATTATAGCTTTTTTTTTGAATAA', 'TGTTTGAGCTGTACGAGATGAAATTCTCATATACAGTTCTTGGAGGGGGG']
3                                                        ['CTCACCCGGAGTTAGTCACTGTTATTGAACAAGCACTTTCAAGGATATCA']
4                                                        ['AAGTGGTGGTATGTCTCCACAGGTTCAAACTCGAAATCAGCAACTTCCTG']
5                                                        ['AAGGGTTCTTAGGTTTGAATTTGTTGACAACAATCCCTTCTTCCTGTTTC']
6  ['ATTTGGCTTCTCACATAACACTGAAGCTGTGTGACTTGTGTACAATTTTG', 'CTGAGTTAATCTAATAAGCAAGATACATTTTACTTTCGTTTTCCTCTTCC']

我需要此输出:

$AT1G79930.2
[1] "GAGGTGCTTGCAAATCGTTCACATCACTGTACTGCACATCAACAGAGAAT"

$ATCG00720.1
[1] "GCTTCTTTGTATTTTATGTTTTTAGTCATTATAGCTTTTTTTTTGAATAA" "TGTTTGAGCTGTACGAGATGAAATTCTCATATACAGTTCTTGGAGGGGGG"

$AT1G02080.2
[1] "CTCACCCGGAGTTAGTCACTGTTATTGAACAAGCACTTTCAAGGATATCA"

$AT4G32551.2
[1] "AAGTGGTGGTATGTCTCCACAGGTTCAAACTCGAAATCAGCAACTTCCTG"

$AT5G66190.1
[1] "AAGGGTTCTTAGGTTTGAATTTGTTGACAACAATCCCTTCTTCCTGTTTC"

$AT1G51720.1
[1] "ATTTGGCTTCTCACATAACACTGAAGCTGTGTGACTTGTGTACAATTTTG" "CTGAGTTAATCTAATAAGCAAGATACATTTTACTTTCGTTTTCCTCTTCC"

与该结果最接近的是以下命令:

> df2 <- split(df, df[1])

> head(df2)
$`AT1G01760.2 `
      Intron.ID                                             Sequence.s.
11 AT1G01760.2   ['ACCGGTTGTTCCAAGAATAACTTCGTGTAAGCCAGAATAGTTCCAACACA']

$`AT1G02080.2 `
     Intron.ID                                             Sequence.s.
3 AT1G02080.2   ['CTCACCCGGAGTTAGTCACTGTTATTGAACAAGCACTTTCAAGGATATCA']

$`AT1G04430.2 `
     Intron.ID                                             Sequence.s.
9 AT1G04430.2   ['CATTATGAACGGCATTGTCCTCCTCCCGAAAGACGGTTTAATTGTTTGAT']

$`AT1G06150.1 `
      Intron.ID                                             Sequence.s.
45 AT1G06150.1   ['TGCTAGTGGATCCGTAAGTGCCAAAAATAAATGCCTGATATGAGTCACCA']

$`AT1G17680.3 `
      Intron.ID                                             Sequence.s.
48 AT1G17680.3   ['GCAAGCACCAGCTTTCGATATAGCATACTATTACCTTTCACGTGTTTCTG']

$`AT1G18470.2 `
      Intron.ID                                             Sequence.s.
81 AT1G18470.2   ['TTCCTTCGTCAATTGACCACCAACCTAATAGCCTGGAACCATGGTGCAAG']

这完全是错误的:序列名称分配不正确,并且缺少某些序列。这不是一个好的解决方案...

根据要求提供的其他信息(我添加了2个“ ...”):

> dput(head(df))
structure(list(Intron.ID = structure(c(15L, 80L, 2L, 58L, 79L, 
9L), .Label = c("AT1G01760.2 ", "AT1G02080.2 ", "AT1G04430.2 ", 
"AT1G06150.1 ", "AT1G17680.3 " ...), class = "factor"), Sequence.s. = structure(c(49L, 
59L, 39L, 3L, 2L, 15L), .Label = c(" ['AAACACAAGGGTGGGGTTGACTCTCAAACTCACAAAAAGTTACATTTTCT']", 
" ['AAGGGTTCTTAGGTTTGAATTTGTTGACAACAATCCCTTCTTCCTGTTTC']", " ['AAGTGGTGGTATGTCTCCACAGGTTCAAACTCGAAATCAGCAACTTCCTG']", 
" ['AATCCATAAAGAAAATGGAGGAGAACATTCAGAATCTGGAAGGTAAGAAC', 'GATTTATGCTTTGGCAACAAAGAGTAGTCATATTCCATACAGGAACTCAA']", 
" ['AATTGATCCAGATTGTAGATTAATTGGACTCCATCTGTATGACGGCTTGT']" ...
), class = "factor")), row.names = c(NA, 6L), class = "data.frame")

因此,如何在不传递标头“ Intron.ID”和“ Sequence.s”的情况下进行转换。到向量,然后只将序列保留在向量内(按正确的顺序和分配),不包括Intron.ID?

任何帮助将不胜感激!

谢谢大家。

顺便说一句,

费尔南达·科斯塔

0 个答案:

没有答案