我需要从两个函数(ZoekAccesieCode + ZoekOrganisme)创建一个字典。函数ZoekAccesieCode返回像“Q6GZX2”和ZoekOrganisme这样的行,如“Frog virus 3(isolate Goorha)”。 ZoekAccesieCode需要成为关键,而ZoekOrganisme需要成为价值。这是我的代码:
import re
file = open("ploop.txt")
text = file.read()
file.close()
def main():
hits = VindHits()
accesie = ZoekAccesieCode(hits)
organisme = ZoekOrganisme(hits, accesie)
MaakDict(accesie, organisme)
def VindHits():
eiwitten = text.split("\n\n")[1:]
eiwitHits = []
for eiwit in eiwitten:
if re.search(r"[AG].{4}GK[ST]", eiwit):
eiwitHits.append(eiwit)
return(eiwitHits)
def ZoekAccesieCode(hits):
for eiwit in hits:
accesieCode = re.findall(r">sp\|(.{6})", eiwit)[0]
return accesieCode
def ZoekOrganisme(hits, accesie):
for eiwit in hits:
organisme = re.findall(r"\n.+?\[(.+?)\]", eiwit)[0]
return organisme
def MaakDict(accesie, organisme):
main()
文件中的一些示例数据:
Hits for PS00017|ATP_GTP_A (pattern) ATP/GTP-binding site motif A (P-loop) : [occurs frequently]
Pattern: [AG]-x(4)-G-K-[ST]
Approximate number of expected random matches in ~ 100'000 sequences (50'000'000 residues): 3371
>sp|Q6GZX2|003R_FRG3G (438 aa)
Uncharacterized protein 3R. [Frog virus 3 (isolate Goorha) (FV-3)]
MARPLLGKTSSVRRRLESLSACSIFFFLRKFCQKMASLVFLNSPVYQMSNILLTERRQVDRAMGGSDDDGVMVVALSPSD
FKTVLGSALLAVERDMVHVVPKYLQTPGILHDMLVLLTPIFGEALSVDMSGATDVMVQQIATAGFVDVDPLHSSVSWKDN
VSCPVALLAVSNAVRTMMGQPCQVTLIIDVGTQNILRDLVNLPVEMSGDLQVMAYTKDPLGKVPAVGVSVFDSGSVQKGD
AHSVGAPDGLVSFHTHPVSSAVELNYHAGWPSNVDMSSLLTMKNLMHVVVAEEGLWTMARTLSMQRLTKVLTDAEKDVMR
AAAFNLFLPLNELRVMGTKDSNNKSLKTYFEVFETFTIGALMKHSGVTPTAFVDRRWLDNTIYHMGFIPWGRDMRFVVEY
DLDGTNPFLNTVPTLMSVKRKAKIQEMFDNMVSRMVTS
2 - 9: ArpllGKT
>sp|Q6GZX1|004R_FRG3G (60 aa)
Uncharacterized protein 004R. [Frog virus 3 (isolate Goorha) (FV-3)]
MNAKYDTDQGVGRMLFLGTIGLAVVVGGLMAYGYYYDGKTPSSGTSFHTASPSFSSRYRY
33 - 40: GyyydGKT
>sp|Q6GZW0|015R_FRG3G (322 aa)
Uncharacterized protein 015R. [Frog virus 3 (isolate Goorha) (FV-3)]
MEQVPIKEMRLSDLRPNNKSIDTDLGGTKLVVIGKPGSGKSTLIKALLDSKRHIIPCAVVISGSEEANGFYKGVVPDLFI
YHQFSPSIIDRIHRRQVKAKAEMGSKKSWLLVVIDDCMDNAKMFNDKEVRALFKNGRHWNVLVVIANQYVMDLTPDLRSS
VDGVFLFRENNVTYRDKTYANFASVVPKKLYPTVMETVCQNYRCMFIDNTKATDNWHDSVFWYKAPYSKSAVAPFGARSY
WKYACSKTGEEMPAVFDNVKILGDLLLKELPEAGEALVTYGGKDGPSDNEDGPSDDEDGPSDDEEGLSKDGVSEYYQSDL
DD
34 - 41: GkpgsGKS
>sp|P32234|128UP_DROME (368 aa)
GTP-binding protein 128up. [Drosophila melanogaster (Fruit fly)]
MSTILEKISAIESEMARTQKNKATSAHLGLLKAKLAKLRRELISPKGGGGGTGEAGFEVAKTGDARVGFVGFPSVGKSTL
LSNLAGVYSEVAAYEFTTLTTVPGCIKYKGAKIQLLDLPGIIEGAKDGKGRGRQVIAVARTCNLIFMVLDCLKPLGHKKL
LEHELEGFGIRLNKKPPNIYYKRKDKGGINLNSMVPQSELDTDLVKTILSEYKIHNADITLRYDATSDDLIDVIEGNRIY
IPCIYLLNKIDQISIEELDVIYKIPHCVPISAHHHWNFDDLLELMWEYLRLQRIYTKPKGQLPDYNSPVVLHNERTSIED
FCNKLHRSIAKEFKYALVWGSSVKHQPQKVGIEHVLNDEDVVQIVKKV
71 - 78: GfpsvGKS
>sp|P05080|194K_TRVSY (1707 aa)
Replicase large subunit. [Tobacco rattle virus (strain SYM)]
MANGNFKLSQLLNVDEMSAEQRSHFFDLMLTKPDCEIGQMMQRVVVDKVDDMIRERKTKDPVIVHEVLSQKEQNKLMEIY
PEFNIVFKDDKNMVHGFAAAERKLQALLLLDRVPALQEVDDIGGQWSFWVTRGEKRIHSCCPNLDIRDDQREISRQIFLT
AIGDQARSGKRQMSENELWMYDQFRKNIAAPNAVRCNNTYQGCTCRGFSDGKKKGAQYAIALHSLYDFKLKDLMATMVEK
KTKVVHAAMLFAPESMLVDEGPLPSVDGYYMKKNGKIYFGFEKDPSFSYIHDWEEYKKYLLGKPVSYQGNVFYFEPWQVR
GDTMLFSIYRIAGVPRRSLSSQEYYRRIYISRWENMVVVPIFDLVESTRELVKKDLFVEKQFMDKCLDYIARLSDQQLTI
SNVKSYLSSNNWVLFINGAAVKNKQSVDSRDLQLLAQTLLVKEQVARPVMRELREAILTETKPITSLTDVLGLISRKLWK
QFANKIAVGGFVGMVGTLIGFYPKKVLTWAKDTPNGPELCYENSHKTKVIVFLSVVYAIGGITLMRRDIRDGLVKKLCDM
FDIKRGAHVLDVENPCRYYEINDFFSSLYSASESGETVLPDLSEVKAKSDKLLQQKKEIADEFLSAKFSNYSGSSVRTSP
PSVVGSSRSGLGLLLEDSNVLTQARVGVSRKVDDEEIMEQFLSGLIDTEAEIDEVVSAFSAECERGETSGTKVLCKPLTP
PGFENVLPAVKPLVSKGKTVKRVDYFQVMGGERLPKRPVVSGDNSVDARREFLYYLDAERVAQNDEIMSLYRDYSRGVIR
TGGQNYPHGLGVWDVEMKNWCIRPVVTEHAYVFQPDKRMDDWSGYLEVAVWERGMLVNDFAVERMSDYVIVCDQTYLCNN
RLILDNLSALDLGPVNCSFELVDGVPGCGKSTMIVNSANPCVDVVLSTGRAATDDLIERFASKGFPCKLKRRVKTVDSFL
MHCVDGSLTGDVLHFDEALMAHAGMVYFCAQIAGAKRCICQGDQNQISFKPRVSQVDLRFSSLVGKFDIVTEKRETYRSP
ADVAAVLNKYYTGDVRTHNATANSMTVRKIVSKEQVSLKPGAQYITFLQSEKKELVNLLALRKVAAKVSTVHESQGETFK
DVVLVRTKPTDDSIARGREYLIVALSRHTQSLVYETVKEDDVSKEIRESAALTKAALARFFVTETVLXRFRSRFDVFRHH
EGPCAVPDSGTITDLEMWYDALFPGNSLRDSSLDGYLVATTDCNLRLDNVTIKSGNWKDKFAEKETFLKPVIRTAMPDKR
KTTQLESLLALQKRNQAAPDLQENVHATVLIEETMKKLKSVVYDVGKIRADPIVNRAQMERWWRNQSTAVQAKVVADVRE
LHEIDYSSYMYMIKSDVKPKTDLTPQFEYSALQTVVYHEKLINSLFGPIFKEINERKLDAMQPHFVFNTRMTSSDLNDRV
KFLNTEAAYDFVEIDMSKFDKSANRFHLQLQLEIYRLFGLDEWAAFLWEVSHTQTTVRDIQNGMMAHIWYQQKSGDADTY
NANSDRTLCALLSELPLEKAVMVTYGGDDSLIAFPRGTQFVDPCPKLATKWNFECKIFKYDVPMFCGKFLLKTSSCYEFV
PDPVKVLTKLGKKSIKDVQHLAEIYISLNDSNRALGNYMVVSKLSESVSDRYLYKGDSVHALCALWKHIKSFTALCTLFR
DENDKELNPAKVDWKKAQRAVSNFYDW
904 - 911: GvpgcGKS
>sp|P03589|1A_AMVLE (1126 aa)
Replication protein 1a. [Alfalfa mosaic virus (strain 425 / isolate Leiden)]
MNADAQSTDASLSMREPLSHASIQEMLRRVVEKQAADDTTAIGKVFSEAGRAYAQDALPSDKGEVLKISFSLDATQQNIL
RANFPGRRTVFSNSSSSSHCFAAAHRLLETDFVYRCFGNTVDSIIDLGGNFVSHMKVKRHNVHCCCPILDARDGARLTER
ILSLKSYVRKHPEIVGEADYCMDTFQKCSRRADYAFAIHSTSDLDVGELACSLDQKGVMKFICTMMVDADMLIHNEGEIP
NFNVRWEIDRKKDLIHFDFIDEPNLGYSHRFSLLKHYLTYNAVDLGHAAYRIERKQDFGGVMVIDLTYSLGFVPKMPHSN
GRSCAWYNRVKGQMVVHTVNEGYYHHSYQTAVRRKVLVDKKVLTRVTEVAFRQFRPNADAHSAIQSIATMLSSSTNHTII
GGVTLISGKPLSPDDYIPVATTIYYRVKKLYNAIPEMLSLLDKGERLSTDAVLKGSEGPMWYSGPTFLSALDKVNVPGDF
VAKALLSLPKRDLKSLFSRSATSHSERTPVRDESPIRCTDGVFYPIRMLLKCLGSDKFESVTITDPRSNTETTVDLYQSF
QKKIETVFSFILGKIDGPSPLISDPVYFQSLEDVYYAEWHQGNAIDASNYARTLLDDIRKQKEESLKAKAKEVEDAQKLN
RAILQVHAYLEAHPDGGKIEGLGLSSQFIAKIPELAIPTPKPLPEFEKNAETGEILRINPHSDAILEAIDYLKSTSANSI
ITLNKLGDHCQWTTKGLDVVWAGDDKRRAFIPKKNTWVGPTARSYPLAKYERAMSKDGYVTLRWDGEVLDANCVRSLSQY
EIVFVDQSCVFASAEAIIPSLEKALGLEAHFSVTIVDGVAGCGKTTNIKQIARSSGRDVDLILTSNRSSADELKETIDCS
PLTKLHYIRTCDSYLMSASAVKAQRLIFDECFLQHAGLVYAAATLAGCSEVIGFGDTEQIPFVSRNPSFVFRHHKLTGKV
ERKLITWRSPADATYCLEKYFYKNKKPVKTNSRVLRSIEVVPINSPVSVERNTNALYLCHTQAEKAVLKAQTHLKGCDNI
FTTHEAQGKTFDNVYFCRLTRTSTSLATGRDPINGPCNGLVALSRHKKTFKYFTIAHDSDDVIYNACRDAGNTDDSILAR
SYNHNF
838 - 845: GvagcGKT
>sp|Q9AT00|TGD3_ARATH (345 aa)
Protein TRIGALACTOSYLDIACYLGLYCEROL 3, chloroplastic. [Arabidopsis thaliana (Mouse-ear cress)]
MLSLSCSSSSSSLLPPSLHYHGSSSVQSIVVPRRSLISFRRKVSCCCIAPPQNLDNDATKFDSLTKSGGGMCKERGLEND
SDVLIECRDVYKSFGEKHILKGVSFKIRHGEAVGVIGPSGTGKSTILKIMAGLLAPDKGEVYIRGKKRAGLISDEEISGL
RIGLVFQSAALFDSLSVRENVGFLLYERSKMSENQISELVTQTLAAVGLKGVENRLPSELSGGMKKRVALARSLIFDTTK
EVIEPEVLLYDEPTAGLDPIASTVVEDLIRSVHMTDEDAVGKPGKIASYLVVTHQHSTIQRAVDRLLFLYEGKIVWQGMT
HEFTTSTNPIVQQFATGSLDGPIRY
117 - 124: GpsgtGKS
有人可以用正确的代码帮助我吗?
答案 0 :(得分:1)
取消几乎无法读取的代码。
public String CREATE_QUERY = "CREATE_TABLE "+ReceiptsTable.TableInformation.TABLE_NAME+"("+ReceiptsTable.TableInformation.RECEIPT_ID+" STRING,"+ReceiptsTable.TableInformation.RECEIPT_FILE+" STRING,"+ReceiptsTable.TableInformation.RECEIPT_URI+" STRING);";
答案 1 :(得分:0)
<?php
$root_directory_path = $_SERVER['DOCUMENT_ROOT'];
ob_start();
session_start();
if (!isset($_SESSION["user_login"])) {
header("Location: index.php");
} else {
$username = $_SESSION["user_login"];
$pathName = $root_directory_path."myScript.php";//I am assuming here
//the script is located inside the root directory, and not in a sub
//directory
require($pathName);
}
?>
答案 2 :(得分:0)
您可以像这样使用zip
或itertools.izip_longest
zip(accesie, organisme)
OR
itertools.izip_longest(accesie, organisme)
前者只会创建与最长输入列表长度一样多的对。如果列表的长度不相等,后者将为您提供可能无法匹配的对。
使用上述任何一种方法后,您可以将其换成dict()
进行转换。
>>> import itertools
>>> accesie = ['accesie1', 'accesie2', 'accesie3', 'accesie4']
>>> organisme = ['organisme1', 'organisme2', 'organisme3']
>>> dict(zip(accesie, organisme))
{'accesie3': 'organisme3', 'accesie2': 'organisme2', 'accesie1': 'organisme1'}
>>> dict(itertools.izip_longest(accesie, organisme))
{'accesie3': 'organisme3', 'accesie2': 'organisme2', 'accesie1': 'organisme1', 'accesie4': None}
从阅读其他评论看来,您正在处理单个元素,而不是元素列表。因此,您应该参考@ TheLazyScripter的答案。只需使用您的密钥和值创建一个字典。
the_dict = {accesie: organisme}
如果你想添加到dict那么它将是
the_dict[accesie] = organisme
答案 3 :(得分:0)
此代码的先决条件是accessie中的值可以是键(它们是字符串或unicode),accessie和organisme是相同长度的列表,否则必须使用切片使它们具有相同的长度
dict(zip(accessie, organisme))