excel中的多个单元格

时间:2013-06-26 10:56:11

标签: excel paste

我有一个像这样的excel文件..

Sr. No.     GENE ID  Gene Id (NCBI) Protein Id  Protein Sequences
1           Lmo0001  984365         NP_463534.1 
2           Lmo0002  984379         NP_463535.1 
3           Lmo0003  984420         NP_463536.1

此列表扩展到3000个基因。我将序列保存在这样的文本板中,这些序列适用于所有3000个基因,每个序列之间有一个空格。

  

GI | 16802049 | REF | NP_463534.1 |染色体复制起始蛋白[单核细胞增生李斯特菌EGD-e]   MQSIEDIWQETLQIVKKNMSKPSYDTWMKSTTAHSLEGNTFIISAPNNFVRDWLEKSYTQFIANILQEIT   GRLFDVRFIDGEQEENFEYTVIKPNPALDEDGIEIGKHMLNPRYVFDTFVIGSGNRFAHAASLAVAEAPA   KAYNPLFIYGGVGLGKTHLMHAVGHYVQQHKDNAKVMYLSSEKFTNEFISSIRDNKTEEFRTKYRNVDVL   LIDDIQFLAGKEGTQEEFFHTFNTLYDEQKQIIISSDRPPKEIPTLEDRLRSRFEWGLITDITPPDLETR   IAILRKKAKADGLDIPNEVMLYIANQIDSNIRELEGALIRVVAYSSLVNKDITAGLAAEALKDIIPSSKS   QVITISGIQEAVGEYFHVRLEDFKAKKRTKSIAFPRQIAMYLSRELTDASLPKIGDEFGGRDHTTVIHAH   EKISQLLKTDQVLKNDLAEIEKNLRKAQNMF

     

GI | 16802050 | REF | NP_463535.1 | DNA聚合酶III亚基β[单核细胞增生李斯特菌EGD-e]   MKFVIERDRLVQAVNEVTRAISARTTIPILTGIKIVVNDEGVTLTGSDSDISIEAFIPLIENDEVIVEVE   SFGGIVLQSKYFGDIVRRLPEENVEIEVTSNYQTNISSGQASFTLNGLDPMEYPKLPEVTDGKTIKIPIN   VLKNIVRQTVFAVSAIEVRPVLTGVNWIIKENKLSAVATDSHRLALREIPLETDIDEEYNIVIPGKSLSE   LNKLLDDASESIEMTLANNQILFKLKDLLFYSRLLEGSYPDTSRLIPTDTKSELVINSKAFLQAIDRASL   LARENRNNVIKLMTLENGQVEVSSNSPEVGNVSENVFSQSFTGEEIKISFNGKYMMDALRAFEGDDIQIS   FSGTMRPFVLRPKDAANPNEILQLITPVRTY

     

GI | 16802051 | REF | NP_463536.1 |假设蛋白质lmo0003 [单核细胞增生李斯特菌EGD-e]   MMKDMTTGNPTKLIFLFAMPMLIGNLFQQFYTMIDAVIVGKFVSVDALAAVGATNSVNFFMISLIIGLMS   GISVVVAQYFGFKDYDRLKDVIATATYAVVFSAIILTVAGVLLAKPLLILLRTPANILDDSTIFLTTLFI   GILPMSLYNGMAAILRALGNSITPLIFLILSSLMNIALDFLFVVYMDMGVRGAAIATVLSQTAAAIAVIY   YAYRHVPFMRIERAKFKLSTPLLKEMVRIGLPSGLQGSFISIGNMALQSLINGFGSSVVAAYTAASRIDS   LTYQPGIAFGAASSMFAGQNIGAGKIDRVREGFWSGIKVVTAISIGITILVQLFARQFLLLFVDSSETEV   INIGVSYLLIVSLFYVVVGILFVVRETLRGTGDAMVPLAMGIFELVSRLVIGFVLSLYIGYVGLWWATPV   AWITATILGVWRYKSGAWQKKAVIRRK

     

GI | 16802052 | REF | NP_463537.1 |假设蛋白质lmo0004 [单核细胞增生李斯特菌EGD-e]   MAETVKINSEFVTLGQLLQMIDVVSTGGMAKAYLSENTIYINGEQDNRRGKKLRNGDVILVPGVGKVKIE   QGK

     

GI | 16802053 | REF | NP_463538.1 |重组蛋白F [单核细胞增生李斯特菌EGD-e]   MHLESIVLRNFRNYENLELEFSPSVNVFLGENAQGKTNLLEAVLMLALAKSHRTTNDKDFIMWEKEEAKM   EGRIAKHGQSVPLELAITQKGKRAKVNHLEQKKLSQYVGNLNVVIFAPEDLSLVKGAPGIRRRFLNMEIG   QMQPIYLHNLSEYQRILQQRNQYLKMLQMKRKVDPILLDILTEQFADVAINLTKRRADFIQKLEAYAAPI   HHQISRGLETLKIEYKASITLNGDDPEVWKADLLQKMESIKQREIDRGVTLIGPHRDDSLFYINGQNVQD   FGSQGQQRTTALSIKLAEIDLIHEETGEYPVLLLDDVLSELDDYRQSHLLGAIEGKVQTFVTTTSTSGID   HETLKQATTFYVEKGTVKKS

是否可以将每个序列点中的每个序列放在每一行上,而无需手动复制和粘贴每个序列?任何方法都没问题。

P.S我很抱歉这张荒谬的桌子,但没有足够的声望点,我无法发布图片,这是我能做到的最好的。

@swapnil但是我想在第一张excel表中的蛋白质序列列下直接从记事本中复制序列。

2 个答案:

答案 0 :(得分:0)

嗯,这里不会是简单的复制/粘贴。我认为你可以做的是将所有内容复制到一个新的Excel工作表中,并使用分隔符管|对文本执行操作以获得最后一点:

 chromosomal replication initiation protein [Listeria monocytogenes EGD-e] MQSIEDIWQETLQIVKKNMSKPSYDTWMKSTTAHSLEGNTFIISAPNNFVRDWLEKSYTQFIANILQEIT GRLFDVRFIDGEQEENFEYTVIKPNPALDEDGIEIGKHMLNPRYVFDTFVIGSGNRFAHAASLAVAEAPA KAYNPLFIYGGVGLGKTHLMHAVGHYVQQHKDNAKVMYLSSEKFTNEFISSIRDNKTEEFRTKYRNVDVL LIDDIQFLAGKEGTQEEFFHTFNTLYDEQKQIIISSDRPPKEIPTLEDRLRSRFEWGLITDITPPDLETR IAILRKKAKADGLDIPNEVMLYIANQIDSNIRELEGALIRVVAYSSLVNKDITAGLAAEALKDIIPSSKS QVITISGIQEAVGEYFHVRLEDFKAKKRTKSIAFPRQIAMYLSRELTDASLPKIGDEFGGRDHTTVIHAH EKISQLLKTDQVLKNDLAEIEKNLRKAQNMF
 DNA polymerase III subunit beta [Listeria monocytogenes EGD-e] MKFVIERDRLVQAVNEVTRAISARTTIPILTGIKIVVNDEGVTLTGSDSDISIEAFIPLIENDEVIVEVE SFGGIVLQSKYFGDIVRRLPEENVEIEVTSNYQTNISSGQASFTLNGLDPMEYPKLPEVTDGKTIKIPIN VLKNIVRQTVFAVSAIEVRPVLTGVNWIIKENKLSAVATDSHRLALREIPLETDIDEEYNIVIPGKSLSE LNKLLDDASESIEMTLANNQILFKLKDLLFYSRLLEGSYPDTSRLIPTDTKSELVINSKAFLQAIDRASL LARENRNNVIKLMTLENGQVEVSSNSPEVGNVSENVFSQSFTGEEIKISFNGKYMMDALRAFEGDDIQIS FSGTMRPFVLRPKDAANPNEILQLITPVRTY
 hypothetical protein lmo0003 [Listeria monocytogenes EGD-e] MMKDMTTGNPTKLIFLFAMPMLIGNLFQQFYTMIDAVIVGKFVSVDALAAVGATNSVNFFMISLIIGLMS GISVVVAQYFGFKDYDRLKDVIATATYAVVFSAIILTVAGVLLAKPLLILLRTPANILDDSTIFLTTLFI GILPMSLYNGMAAILRALGNSITPLIFLILSSLMNIALDFLFVVYMDMGVRGAAIATVLSQTAAAIAVIY YAYRHVPFMRIERAKFKLSTPLLKEMVRIGLPSGLQGSFISIGNMALQSLINGFGSSVVAAYTAASRIDS LTYQPGIAFGAASSMFAGQNIGAGKIDRVREGFWSGIKVVTAISIGITILVQLFARQFLLLFVDSSETEV INIGVSYLLIVSLFYVVVGILFVVRETLRGTGDAMVPLAMGIFELVSRLVIGFVLSLYIGYVGLWWATPV AWITATILGVWRYKSGAWQKKAVIRRK
 hypothetical protein lmo0004 [Listeria monocytogenes EGD-e] MAETVKINSEFVTLGQLLQMIDVVSTGGMAKAYLSENTIYINGEQDNRRGKKLRNGDVILVPGVGKVKIE QGK
 recombination protein F [Listeria monocytogenes EGD-e] MHLESIVLRNFRNYENLELEFSPSVNVFLGENAQGKTNLLEAVLMLALAKSHRTTNDKDFIMWEKEEAKM EGRIAKHGQSVPLELAITQKGKRAKVNHLEQKKLSQYVGNLNVVIFAPEDLSLVKGAPGIRRRFLNMEIG QMQPIYLHNLSEYQRILQQRNQYLKMLQMKRKVDPILLDILTEQFADVAINLTKRRADFIQKLEAYAAPI HHQISRGLETLKIEYKASITLNGDDPEVWKADLLQKMESIKQREIDRGVTLIGPHRDDSLFYINGQNVQD FGSQGQQRTTALSIKLAEIDLIHEETGEYPVLLLDDVLSELDDYRQSHLLGAIEGKVQTFVTTTSTSGID HETLKQATTFYVEKGTVKKS

这应该转到E栏。然后在F栏中,您可以输入公式:

=mid(E1, find("]", E1)+2, len(E1))

这将在结束方括号]之后提取所有内容,从而返回所需的序列。

假设这些文件位于excel文件工作簿中的名称为Sheet2的工作表中(其中第一个工作表包含您现在拥有的表格)。

在第一张表中,输入公式:

=vlookup(D2, Sheet2!D:F, 3, 0)

这假设您的文本文件与表中列出的Protein Ids的顺序不同。否则,您只需将F列结果的值(复制和粘贴特殊,选择粘贴值)复制/粘贴到第一张表中,

答案 1 :(得分:0)

感谢您的回答。我实际上是使用正则表达式\ n ^ [a-z]在textpad上编辑它,然后将其复制到excel。所以这就解决了。谢谢了。我从另一个堆栈溢出问题中得到了这个建议。