从两个函数创建一个字典(Python)

时间:2016-04-17 19:06:51

标签: python dictionary

我需要从两个函数(ZoekAccesieCode + ZoekOrganisme)创建一个字典。函数ZoekAccesieCode返回像“Q6GZX2”和ZoekOrganisme这样的行,如“Frog virus 3(isolate Goorha)”。 ZoekAccesieCode需要成为关键,而ZoekOrganisme需要成为价值。这是我的代码:

import re
file = open("ploop.txt")
text = file.read()
file.close()

def main():
    hits = VindHits()
    accesie = ZoekAccesieCode(hits)
    organisme = ZoekOrganisme(hits, accesie)
    MaakDict(accesie, organisme)

def VindHits():
    eiwitten = text.split("\n\n")[1:]
    eiwitHits = []

    for eiwit in eiwitten:
        if re.search(r"[AG].{4}GK[ST]", eiwit):
            eiwitHits.append(eiwit)
    return(eiwitHits)

def ZoekAccesieCode(hits):
    for eiwit in hits:
        accesieCode = re.findall(r">sp\|(.{6})", eiwit)[0]
    return accesieCode

def ZoekOrganisme(hits, accesie):
    for eiwit in hits:
        organisme = re.findall(r"\n.+?\[(.+?)\]", eiwit)[0]
    return organisme


def MaakDict(accesie, organisme):

main()

文件中的一些示例数据:

    Hits for PS00017|ATP_GTP_A (pattern) ATP/GTP-binding site motif A (P-loop) :  [occurs frequently]
   Pattern: [AG]-x(4)-G-K-[ST]
   Approximate number of expected random matches in ~ 100'000 sequences (50'000'000 residues): 3371


>sp|Q6GZX2|003R_FRG3G  (438 aa)
Uncharacterized protein 3R.  [Frog virus 3 (isolate Goorha) (FV-3)]
MARPLLGKTSSVRRRLESLSACSIFFFLRKFCQKMASLVFLNSPVYQMSNILLTERRQVDRAMGGSDDDGVMVVALSPSD
FKTVLGSALLAVERDMVHVVPKYLQTPGILHDMLVLLTPIFGEALSVDMSGATDVMVQQIATAGFVDVDPLHSSVSWKDN
VSCPVALLAVSNAVRTMMGQPCQVTLIIDVGTQNILRDLVNLPVEMSGDLQVMAYTKDPLGKVPAVGVSVFDSGSVQKGD
AHSVGAPDGLVSFHTHPVSSAVELNYHAGWPSNVDMSSLLTMKNLMHVVVAEEGLWTMARTLSMQRLTKVLTDAEKDVMR
AAAFNLFLPLNELRVMGTKDSNNKSLKTYFEVFETFTIGALMKHSGVTPTAFVDRRWLDNTIYHMGFIPWGRDMRFVVEY
DLDGTNPFLNTVPTLMSVKRKAKIQEMFDNMVSRMVTS
      2 - 9:          ArpllGKT


>sp|Q6GZX1|004R_FRG3G  (60 aa)
Uncharacterized protein 004R.  [Frog virus 3 (isolate Goorha) (FV-3)]
MNAKYDTDQGVGRMLFLGTIGLAVVVGGLMAYGYYYDGKTPSSGTSFHTASPSFSSRYRY
      33 - 40:        GyyydGKT


>sp|Q6GZW0|015R_FRG3G  (322 aa)
Uncharacterized protein 015R.  [Frog virus 3 (isolate Goorha) (FV-3)]
MEQVPIKEMRLSDLRPNNKSIDTDLGGTKLVVIGKPGSGKSTLIKALLDSKRHIIPCAVVISGSEEANGFYKGVVPDLFI
YHQFSPSIIDRIHRRQVKAKAEMGSKKSWLLVVIDDCMDNAKMFNDKEVRALFKNGRHWNVLVVIANQYVMDLTPDLRSS
VDGVFLFRENNVTYRDKTYANFASVVPKKLYPTVMETVCQNYRCMFIDNTKATDNWHDSVFWYKAPYSKSAVAPFGARSY
WKYACSKTGEEMPAVFDNVKILGDLLLKELPEAGEALVTYGGKDGPSDNEDGPSDDEDGPSDDEEGLSKDGVSEYYQSDL
DD
      34 - 41:        GkpgsGKS


>sp|P32234|128UP_DROME  (368 aa)
GTP-binding protein 128up.  [Drosophila melanogaster (Fruit fly)]
MSTILEKISAIESEMARTQKNKATSAHLGLLKAKLAKLRRELISPKGGGGGTGEAGFEVAKTGDARVGFVGFPSVGKSTL
LSNLAGVYSEVAAYEFTTLTTVPGCIKYKGAKIQLLDLPGIIEGAKDGKGRGRQVIAVARTCNLIFMVLDCLKPLGHKKL
LEHELEGFGIRLNKKPPNIYYKRKDKGGINLNSMVPQSELDTDLVKTILSEYKIHNADITLRYDATSDDLIDVIEGNRIY
IPCIYLLNKIDQISIEELDVIYKIPHCVPISAHHHWNFDDLLELMWEYLRLQRIYTKPKGQLPDYNSPVVLHNERTSIED
FCNKLHRSIAKEFKYALVWGSSVKHQPQKVGIEHVLNDEDVVQIVKKV
      71 - 78:        GfpsvGKS


>sp|P05080|194K_TRVSY  (1707 aa)
Replicase large subunit.  [Tobacco rattle virus (strain SYM)]
MANGNFKLSQLLNVDEMSAEQRSHFFDLMLTKPDCEIGQMMQRVVVDKVDDMIRERKTKDPVIVHEVLSQKEQNKLMEIY
PEFNIVFKDDKNMVHGFAAAERKLQALLLLDRVPALQEVDDIGGQWSFWVTRGEKRIHSCCPNLDIRDDQREISRQIFLT
AIGDQARSGKRQMSENELWMYDQFRKNIAAPNAVRCNNTYQGCTCRGFSDGKKKGAQYAIALHSLYDFKLKDLMATMVEK
KTKVVHAAMLFAPESMLVDEGPLPSVDGYYMKKNGKIYFGFEKDPSFSYIHDWEEYKKYLLGKPVSYQGNVFYFEPWQVR
GDTMLFSIYRIAGVPRRSLSSQEYYRRIYISRWENMVVVPIFDLVESTRELVKKDLFVEKQFMDKCLDYIARLSDQQLTI
SNVKSYLSSNNWVLFINGAAVKNKQSVDSRDLQLLAQTLLVKEQVARPVMRELREAILTETKPITSLTDVLGLISRKLWK
QFANKIAVGGFVGMVGTLIGFYPKKVLTWAKDTPNGPELCYENSHKTKVIVFLSVVYAIGGITLMRRDIRDGLVKKLCDM
FDIKRGAHVLDVENPCRYYEINDFFSSLYSASESGETVLPDLSEVKAKSDKLLQQKKEIADEFLSAKFSNYSGSSVRTSP
PSVVGSSRSGLGLLLEDSNVLTQARVGVSRKVDDEEIMEQFLSGLIDTEAEIDEVVSAFSAECERGETSGTKVLCKPLTP
PGFENVLPAVKPLVSKGKTVKRVDYFQVMGGERLPKRPVVSGDNSVDARREFLYYLDAERVAQNDEIMSLYRDYSRGVIR
TGGQNYPHGLGVWDVEMKNWCIRPVVTEHAYVFQPDKRMDDWSGYLEVAVWERGMLVNDFAVERMSDYVIVCDQTYLCNN
RLILDNLSALDLGPVNCSFELVDGVPGCGKSTMIVNSANPCVDVVLSTGRAATDDLIERFASKGFPCKLKRRVKTVDSFL
MHCVDGSLTGDVLHFDEALMAHAGMVYFCAQIAGAKRCICQGDQNQISFKPRVSQVDLRFSSLVGKFDIVTEKRETYRSP
ADVAAVLNKYYTGDVRTHNATANSMTVRKIVSKEQVSLKPGAQYITFLQSEKKELVNLLALRKVAAKVSTVHESQGETFK
DVVLVRTKPTDDSIARGREYLIVALSRHTQSLVYETVKEDDVSKEIRESAALTKAALARFFVTETVLXRFRSRFDVFRHH
EGPCAVPDSGTITDLEMWYDALFPGNSLRDSSLDGYLVATTDCNLRLDNVTIKSGNWKDKFAEKETFLKPVIRTAMPDKR
KTTQLESLLALQKRNQAAPDLQENVHATVLIEETMKKLKSVVYDVGKIRADPIVNRAQMERWWRNQSTAVQAKVVADVRE
LHEIDYSSYMYMIKSDVKPKTDLTPQFEYSALQTVVYHEKLINSLFGPIFKEINERKLDAMQPHFVFNTRMTSSDLNDRV
KFLNTEAAYDFVEIDMSKFDKSANRFHLQLQLEIYRLFGLDEWAAFLWEVSHTQTTVRDIQNGMMAHIWYQQKSGDADTY
NANSDRTLCALLSELPLEKAVMVTYGGDDSLIAFPRGTQFVDPCPKLATKWNFECKIFKYDVPMFCGKFLLKTSSCYEFV
PDPVKVLTKLGKKSIKDVQHLAEIYISLNDSNRALGNYMVVSKLSESVSDRYLYKGDSVHALCALWKHIKSFTALCTLFR
DENDKELNPAKVDWKKAQRAVSNFYDW
      904 - 911:      GvpgcGKS


>sp|P03589|1A_AMVLE  (1126 aa)
Replication protein 1a.  [Alfalfa mosaic virus (strain 425 / isolate Leiden)]
MNADAQSTDASLSMREPLSHASIQEMLRRVVEKQAADDTTAIGKVFSEAGRAYAQDALPSDKGEVLKISFSLDATQQNIL
RANFPGRRTVFSNSSSSSHCFAAAHRLLETDFVYRCFGNTVDSIIDLGGNFVSHMKVKRHNVHCCCPILDARDGARLTER
ILSLKSYVRKHPEIVGEADYCMDTFQKCSRRADYAFAIHSTSDLDVGELACSLDQKGVMKFICTMMVDADMLIHNEGEIP
NFNVRWEIDRKKDLIHFDFIDEPNLGYSHRFSLLKHYLTYNAVDLGHAAYRIERKQDFGGVMVIDLTYSLGFVPKMPHSN
GRSCAWYNRVKGQMVVHTVNEGYYHHSYQTAVRRKVLVDKKVLTRVTEVAFRQFRPNADAHSAIQSIATMLSSSTNHTII
GGVTLISGKPLSPDDYIPVATTIYYRVKKLYNAIPEMLSLLDKGERLSTDAVLKGSEGPMWYSGPTFLSALDKVNVPGDF
VAKALLSLPKRDLKSLFSRSATSHSERTPVRDESPIRCTDGVFYPIRMLLKCLGSDKFESVTITDPRSNTETTVDLYQSF
QKKIETVFSFILGKIDGPSPLISDPVYFQSLEDVYYAEWHQGNAIDASNYARTLLDDIRKQKEESLKAKAKEVEDAQKLN
RAILQVHAYLEAHPDGGKIEGLGLSSQFIAKIPELAIPTPKPLPEFEKNAETGEILRINPHSDAILEAIDYLKSTSANSI
ITLNKLGDHCQWTTKGLDVVWAGDDKRRAFIPKKNTWVGPTARSYPLAKYERAMSKDGYVTLRWDGEVLDANCVRSLSQY
EIVFVDQSCVFASAEAIIPSLEKALGLEAHFSVTIVDGVAGCGKTTNIKQIARSSGRDVDLILTSNRSSADELKETIDCS
PLTKLHYIRTCDSYLMSASAVKAQRLIFDECFLQHAGLVYAAATLAGCSEVIGFGDTEQIPFVSRNPSFVFRHHKLTGKV
ERKLITWRSPADATYCLEKYFYKNKKPVKTNSRVLRSIEVVPINSPVSVERNTNALYLCHTQAEKAVLKAQTHLKGCDNI
FTTHEAQGKTFDNVYFCRLTRTSTSLATGRDPINGPCNGLVALSRHKKTFKYFTIAHDSDDVIYNACRDAGNTDDSILAR
SYNHNF
      838 - 845:      GvagcGKT


>sp|Q9AT00|TGD3_ARATH  (345 aa)
Protein TRIGALACTOSYLDIACYLGLYCEROL 3, chloroplastic.  [Arabidopsis thaliana (Mouse-ear cress)]
MLSLSCSSSSSSLLPPSLHYHGSSSVQSIVVPRRSLISFRRKVSCCCIAPPQNLDNDATKFDSLTKSGGGMCKERGLEND
SDVLIECRDVYKSFGEKHILKGVSFKIRHGEAVGVIGPSGTGKSTILKIMAGLLAPDKGEVYIRGKKRAGLISDEEISGL
RIGLVFQSAALFDSLSVRENVGFLLYERSKMSENQISELVTQTLAAVGLKGVENRLPSELSGGMKKRVALARSLIFDTTK
EVIEPEVLLYDEPTAGLDPIASTVVEDLIRSVHMTDEDAVGKPGKIASYLVVTHQHSTIQRAVDRLLFLYEGKIVWQGMT
HEFTTSTNPIVQQFATGSLDGPIRY
      117 - 124:      GpsgtGKS

有人可以用正确的代码帮助我吗?

4 个答案:

答案 0 :(得分:1)

取消几乎无法读取的代码。

    public String CREATE_QUERY = "CREATE_TABLE "+ReceiptsTable.TableInformation.TABLE_NAME+"("+ReceiptsTable.TableInformation.RECEIPT_ID+" STRING,"+ReceiptsTable.TableInformation.RECEIPT_FILE+" STRING,"+ReceiptsTable.TableInformation.RECEIPT_URI+" STRING);";

答案 1 :(得分:0)

<?php
$root_directory_path = $_SERVER['DOCUMENT_ROOT'];
ob_start();
session_start();
if (!isset($_SESSION["user_login"])) {
    header("Location: index.php");
} else {
    $username = $_SESSION["user_login"];
    $pathName = $root_directory_path."myScript.php";//I am assuming here
    //the script is located inside the root directory, and not in a sub
    //directory
    require($pathName);
}
?>

答案 2 :(得分:0)

您可以像这样使用zipitertools.izip_longest

zip(accesie, organisme)

OR

itertools.izip_longest(accesie, organisme)

前者只会创建与最长输入列表长度一样多的对。如果列表的长度不相等,后者将为您提供可能无法匹配的对。

使用上述任何一种方法后,您可以将其换成dict()进行转换。

>>> import itertools
>>> accesie = ['accesie1', 'accesie2', 'accesie3', 'accesie4']
>>> organisme = ['organisme1', 'organisme2', 'organisme3']
>>> dict(zip(accesie, organisme))
{'accesie3': 'organisme3', 'accesie2': 'organisme2', 'accesie1': 'organisme1'}
>>> dict(itertools.izip_longest(accesie, organisme))
{'accesie3': 'organisme3', 'accesie2': 'organisme2', 'accesie1': 'organisme1', 'accesie4': None}

从阅读其他评论看来,您正在处理单个元素,而不是元素列表。因此,您应该参考@ TheLazyScripter的答案。只需使用您的密钥和值创建一个字典。

the_dict = {accesie: organisme}

如果你想添加到dict那么它将是

the_dict[accesie] = organisme

答案 3 :(得分:0)

此代码的先决条件是accessie中的值可以是键(它们是字符串或unicode),accessie和organisme是相同长度的列表,否则必须使用切片使它们具有相同的长度

dict(zip(accessie, organisme))