我很难尝试将数据帧转换为向量列表,因为它的名称是数据帧第一列中的字符串,而向量本身已经是数据第二列中的向量框架。
我有这个数据框:
> head(df)
Intron.ID
1 AT1G79930.2
2 ATCG00720.1
3 AT1G02080.2
4 AT4G32551.2
5 AT5G66190.1
6 AT1G51720.1
Sequence.s.
1 ['GAGGTGCTTGCAAATCGTTCACATCACTGTACTGCACATCAACAGAGAAT']
2 ['GCTTCTTTGTATTTTATGTTTTTAGTCATTATAGCTTTTTTTTTGAATAA', 'TGTTTGAGCTGTACGAGATGAAATTCTCATATACAGTTCTTGGAGGGGGG']
3 ['CTCACCCGGAGTTAGTCACTGTTATTGAACAAGCACTTTCAAGGATATCA']
4 ['AAGTGGTGGTATGTCTCCACAGGTTCAAACTCGAAATCAGCAACTTCCTG']
5 ['AAGGGTTCTTAGGTTTGAATTTGTTGACAACAATCCCTTCTTCCTGTTTC']
6 ['ATTTGGCTTCTCACATAACACTGAAGCTGTGTGACTTGTGTACAATTTTG', 'CTGAGTTAATCTAATAAGCAAGATACATTTTACTTTCGTTTTCCTCTTCC']
我需要此输出:
$AT1G79930.2
[1] "GAGGTGCTTGCAAATCGTTCACATCACTGTACTGCACATCAACAGAGAAT"
$ATCG00720.1
[1] "GCTTCTTTGTATTTTATGTTTTTAGTCATTATAGCTTTTTTTTTGAATAA" "TGTTTGAGCTGTACGAGATGAAATTCTCATATACAGTTCTTGGAGGGGGG"
$AT1G02080.2
[1] "CTCACCCGGAGTTAGTCACTGTTATTGAACAAGCACTTTCAAGGATATCA"
$AT4G32551.2
[1] "AAGTGGTGGTATGTCTCCACAGGTTCAAACTCGAAATCAGCAACTTCCTG"
$AT5G66190.1
[1] "AAGGGTTCTTAGGTTTGAATTTGTTGACAACAATCCCTTCTTCCTGTTTC"
$AT1G51720.1
[1] "ATTTGGCTTCTCACATAACACTGAAGCTGTGTGACTTGTGTACAATTTTG" "CTGAGTTAATCTAATAAGCAAGATACATTTTACTTTCGTTTTCCTCTTCC"
与该结果最接近的是以下命令:
> df2 <- split(df, df[1])
> head(df2)
$`AT1G01760.2 `
Intron.ID Sequence.s.
11 AT1G01760.2 ['ACCGGTTGTTCCAAGAATAACTTCGTGTAAGCCAGAATAGTTCCAACACA']
$`AT1G02080.2 `
Intron.ID Sequence.s.
3 AT1G02080.2 ['CTCACCCGGAGTTAGTCACTGTTATTGAACAAGCACTTTCAAGGATATCA']
$`AT1G04430.2 `
Intron.ID Sequence.s.
9 AT1G04430.2 ['CATTATGAACGGCATTGTCCTCCTCCCGAAAGACGGTTTAATTGTTTGAT']
$`AT1G06150.1 `
Intron.ID Sequence.s.
45 AT1G06150.1 ['TGCTAGTGGATCCGTAAGTGCCAAAAATAAATGCCTGATATGAGTCACCA']
$`AT1G17680.3 `
Intron.ID Sequence.s.
48 AT1G17680.3 ['GCAAGCACCAGCTTTCGATATAGCATACTATTACCTTTCACGTGTTTCTG']
$`AT1G18470.2 `
Intron.ID Sequence.s.
81 AT1G18470.2 ['TTCCTTCGTCAATTGACCACCAACCTAATAGCCTGGAACCATGGTGCAAG']
这完全是错误的:序列名称分配不正确,并且缺少某些序列。这不是一个好的解决方案...
根据要求提供的其他信息(我添加了2个“ ...”):
> dput(head(df))
structure(list(Intron.ID = structure(c(15L, 80L, 2L, 58L, 79L,
9L), .Label = c("AT1G01760.2 ", "AT1G02080.2 ", "AT1G04430.2 ",
"AT1G06150.1 ", "AT1G17680.3 " ...), class = "factor"), Sequence.s. = structure(c(49L,
59L, 39L, 3L, 2L, 15L), .Label = c(" ['AAACACAAGGGTGGGGTTGACTCTCAAACTCACAAAAAGTTACATTTTCT']",
" ['AAGGGTTCTTAGGTTTGAATTTGTTGACAACAATCCCTTCTTCCTGTTTC']", " ['AAGTGGTGGTATGTCTCCACAGGTTCAAACTCGAAATCAGCAACTTCCTG']",
" ['AATCCATAAAGAAAATGGAGGAGAACATTCAGAATCTGGAAGGTAAGAAC', 'GATTTATGCTTTGGCAACAAAGAGTAGTCATATTCCATACAGGAACTCAA']",
" ['AATTGATCCAGATTGTAGATTAATTGGACTCCATCTGTATGACGGCTTGT']" ...
), class = "factor")), row.names = c(NA, 6L), class = "data.frame")
因此,如何在不传递标头“ Intron.ID”和“ Sequence.s”的情况下进行转换。到向量,然后只将序列保留在向量内(按正确的顺序和分配),不包括Intron.ID?
任何帮助将不胜感激!
谢谢大家。
顺便说一句,
费尔南达·科斯塔