我有一个关键字列表;你可以在这里看到它的一个子集:
'Ablation', 'Advance Directive (living will)', 'Aerobic Exercise',
'Ambulatory EKG Monitors', 'Anemia', 'Aneurysm', 'Angina (also called angina
pectoris)', 'Angiogenesis', 'Angioplasty', 'Angiotensin-Converting Enzyme
Inhibitors (ACE inhibitors)', 'Angiotensin II Receptor Blockers (ARBs)',
'Annulus',
我还有一个包含一长串数据的数据集,它的子集也在这里:
!Annotation_platform_organism = Mus musculus
^DATASET = GDS1001
#ID_REF = Platform reference identifier
ID_REF IDENTIFIER GSM19023 GSM19024 GSM19025 GSM19026 Gene title Gene symbol Gene ID UniGene title UniGene symbol UniGene ID Nucleotide Title GI GenBank Accession Platform_CLONEID Platform_ORF Platform_SPOTID Chromosome location Chromosome annotation GO:Function GO:Process GO:Component GO:Function ID GO:Process ID GO:Component ID
100001_at Cd3g 7046.7 5672.4 743.4 592.9 CD3 antigen, gamma polypeptide Cd3g 12502 Mouse CD3-gamma (T3-gamma) gene, exon 7 192496 M18228 9 24.84 cM Chromosome 9, NC_000075.6 (44969572..44980431, complement) protein binding///protein heterodimerization activity///transmembrane signaling receptor activity cell surface receptor signaling pathway///establishment or maintenance of cell polarity///protein transport///regulation of lymphocyte apoptotic process alpha-beta T cell receptor complex///integral component of membrane///membrane GO:0005515///GO:0046982///GO:0004888 GO:0007166///GO:0007163///GO:0015031///GO:0070228 GO:0042105///GO:0016021///GO:0016020
100002_at Itih3 104.3 169.3 170.2 80.3 inter-alpha trypsin inhibitor, heavy chain 3 Itih3 16426 M.musculus mRNA for inter-alpha-inhibitor H3 chain 695635 X70393 14 A2-C1 Chromosome 14, NC_000080.6 (30908572..30923755, complement) peptidase inhibitor activity///serine-type endopeptidase inhibitor activity hyaluronan metabolic process///negative regulation of peptidase activity extracellular exosome///extracellular region GO:0030414///GO:0004867 GO:0030212///GO:0010466 GO:0070062///GO:0005576
100003_at Ryr1 404.6 328.8 309.6 398.1 ryanodine receptor 1, skeletal muscle Ryr1 20190 Mus musculus RyR1 mRNA for skeletal muscle ryanodine receptor, partial cds 1030707 D38216 7 A2-B3 Chromosome 7, NC_000073.6 (29003340..29125179, complement) calcium channel activity///calcium ion binding///calcium-release channel activity///calmodulin binding///enzyme binding///ion channel activity///protease binding///protein binding///ryanodine-sensitive calcium-release channel activity///ryanodine-sensitive calcium-release channel activity///voltage-gated calcium channel activity calcium ion transmembrane transport///calcium ion transport///calcium ion transport///cellular calcium ion homeostasis///cellular response to caffeine///ion transport///multicellular organism development///muscle contraction///muscle contraction///ossification involved in bone maturation///outflow tract morphogenesis///regulation of cytosolic calcium ion concentration///regulation of muscle contraction///release of sequestered calcium ion into cytosol///release of sequestered calcium ion into cytosol///release of sequestered calcium ion into cytosol by sarcoplasmic reticulum///response to caffeine///response to hypoxia///sarcoplasmic reticulum calcium ion transport///skeletal muscle fiber development///skin development///transmembrane transport///transport I band///T-tubule///cell cortex///cytoplasm///extracellular exosome///extrinsic component of cytoplasmic side of plasma membrane///integral component of membrane///junctional membrane complex///membrane///plasma membrane///protein complex///sarcolemma///sarcoplasmic reticulum///sarcoplasmic reticulum///sarcoplasmic reticulum membrane///sarcoplasmic reticulum membrane///smooth endoplasmic reticulum GO:0005262///GO:0005509///GO:0015278///GO:0005516///GO:0019899///GO:0005216///GO:0002020///GO:0005515///GO:0005219///GO:0005219///GO:0005245 GO:0070588///GO:0006816///GO:0006816///GO:0006874///GO:0071313///GO:0006811///GO:0007275///GO:0006936///GO:0006936///GO:0043931///GO:0003151///GO:0051480///GO:0006937///GO:0051209///GO:0051209///GO:0014808///GO:0031000///GO:0001666///GO:0070296///GO:0048741///GO:0043588///GO:0055085///GO:0006810 GO:0031674///GO:0030315///GO:0005938///GO:0005737///GO:0070062///GO:0031234///GO:0016021///GO:0030314///GO:0016020///GO:0005886///GO:0043234///GO:0042383///GO:0016529///GO:0016529///GO:0033017///GO:0033017///GO:0005790
100004_at Ints7 823.2 850 407.5 431.3 integrator complex subunit 7 Ints7 77065 UI-M-BH2.3-aob-a-12-0-UI.s1 NIH_BMAP_M_S3.3 Mus musculus cDNA clone UI-M-BH2.3-aob-a-12-0-UI 3-, mRNA sequence 6096223 AW120890 1 H6 Chromosome 1, NC_000067.6 (191575636..191623690) molecular_function DNA damage checkpoint///cellular response to DNA damage stimulus///cellular response to ionizing radiation///snRNA processing chromosome///integrator complex///nucleus GO:0003674 GO:0000077///GO:0006974///GO:0071479///GO:0016180 GO:0005694///GO:0032039///GO:0005634
100005_at Traf4 1460.6 1377.4 879.3 803 TNF receptor associated factor 4 Traf4 22032 M.musculus mRNA for CART1 protein 1041445 X92346 11 B5-C Chromosome 11, NC_000077.6 (78158423..78165569, complement) WW domain binding///identical protein binding///metal ion binding///protein kinase binding///thioesterase binding///tumor necrosis factor receptor binding///ubiquitin protein ligase binding///ubiquitin-protein transferase activity///zinc ion binding apoptotic process///multicellular organism development///positive regulation of JNK cascade///positive regulation of protein homodimerization activity///positive regulation of protein kinase activity///protein ubiquitination///regulation of apoptotic process///respiratory gaseous exchange///respiratory tube development///signal transduction bicellular tight junction///cell junction///cytoplasm///cytoskeleton///membrane///nucleus///plasma membrane GO:0050699///GO:0042802///GO:0046872///GO:0019901///GO:0031996///GO:0005164///GO:0031625///GO:0004842///GO:0008270 GO:0006915///GO:0007275///GO:0046330///GO:0090073///GO:0045860///GO:0016567///GO:0042981///GO:0007585///GO:0030323///GO:0007165 GO:0005923///GO:0030054///GO:0005737///GO:0005856///GO:0016020///GO:0005634///GO:0005886
100006_at Cdh11 164.7 98 26.6 131 cadherin 11 Cdh11 12552 Mus musculus osf-4 mRNA for OB-cadherin-1, complete cds 994774 D21253 8 50.44 cM Chromosome 8, NC_000074.6 (102632095..102785983, complement) calcium ion binding///metal ion binding cell adhesion///corticospinal tract morphogenesis///homophilic cell adhesion via plasma membrane adhesion molecules cytoplasm///extracellular exosome///integral component of membrane///membrane///plasma membrane GO:0005509///GO:0046872 GO:0007155///GO:0021957///GO:0007156 GO:0005737///GO:0070062///GO:0016021///GO:0016020///GO:0005886
100007_at Irf2bp1 2418.3 2669.4 2749.8 2717.8 interferon regulatory factor 2 binding protein 1 Irf2bp1 272359 UI-M-AL0-abs-g-06-0-UI.s1 NIH_BMAP_MCO Mus musculus cDNA clone UI-M-AL0-abs-g-06-0-UI 3-, mRNA sequence 5471786 AI837573 7 A3 Chromosome 7, NC_000073.6 (19004065..19006763) UDP-N-acetylmuramoylalanyl-D-glutamyl-2,6-diaminopimelate-D-alanyl-D-alanine ligase activity///coenzyme F420-0 gamma-glutamyl ligase activity///coenzyme F420-2 alpha-glutamyl ligase activity///ligase activity///metal ion binding///protein-glutamic acid ligase activity///protein-glycine ligase activity///protein-glycine ligase activity, elongating///protein-glycine ligase activity, initiating///ribosomal S6-glutamic acid ligase activity///transcription corepressor activity///transcription factor binding///tubulin-glutamic acid ligase activity///tubulin-glycine ligase activity negative regulation of transcription from RNA polymerase II promoter///regulation of transcription, DNA-templated///transcription, DNA-templated nucleoplasm///nucleus GO:0008766///GO:0043773///GO:0043774///GO:0016874///GO:0046872///GO:0070739///GO:0070735///GO:0070737///GO:0070736///GO:0018169///GO:0003714///GO:0008134///GO:0070740///GO:0070738 GO:0000122///GO:0006355///GO:0006351 GO:0005654///GO:0005634
100009_r_at Sox2 475.6 457.7 451.8 479.7 SRY (sex determining region Y)-box 2 Sox2 20674 M.musculus SOX2 gene 1209429 X94127 3 A2-B Chromosome 3, NC_000069.6 (34649995..34652461) DNA binding///DNA binding///DNA binding///RNA polymerase II transcription factor activity, sequence-specific DNA binding///chromatin DNA binding///chromatin binding///miRNA binding///protein binding///protein heterodimerization activity///sequence-specific DNA binding///sequence-specific DNA binding///transcription factor activity, sequence-specific DNA binding///transcription factor activity, sequence-specific DNA binding///transcription factor binding///transcription regulatory region DNA binding///transcription regulatory region DNA binding///transcription regulatory region sequence-specific DNA binding///contributes_to transcriptional activator activity, RNA polymerase II core promoter proximal region sequence-specific binding adenohypophysis development///anatomical structure formation involved in morphogenesis///cell cycle arrest///cell fate commitment///cell fate specification///cerebral cortex development///detection of mechanical stimulus involved in equilibrioception///detection of mechanical stimulus involved in sensory perception of sound///diencephalon morphogenesis///embryonic organ development///endodermal cell fate specification///epithelial tube branching involved in lung morphogenesis///forebrain neuron differentiation///inner ear morphogenesis///lens induction in camera-type eye///lung alveolus development///male genitalia development///multicellular organism development///negative regulation of Wnt signaling pathway///negative regulation of canonical Wnt signaling pathway///negative regulation of canonical Wnt signaling pathway///negative regulation of cell differentiation///negative regulation of epithelial cell proliferation///negative regulation of neuron differentiation///negative regulation of osteoblast differentiation///negative regulation of transcription from RNA polymerase II promoter///neuron fate commitment///neuronal stem cell population maintenance///olfactory placode formation///osteoblast differentiation///pigment biosynthetic process///positive regulation of MAPK cascade///positive regulation of Notch signaling pathway///positive regulation of cell differentiation///positive regulation of cell-cell adhesion///positive regulation of epithelial cell differentiation///positive regulation of neuroblast proliferation///positive regulation of neuron differentiation///positive regulation of transcription from RNA polymerase II promoter///positive regulation of transcription from RNA polymerase II promoter///positive regulation of transcription from RNA polymerase II promoter///positive regulation of transcription from RNA polymerase II promoter///positive regulation of transcription, DNA-templated///regulation of cysteine-type endopeptidase activity involved in apoptotic process///regulation of gene expression///regulation of neurogenesis///regulation of transcription from RNA polymerase II promoter///regulation of transcription, DNA-templated///regulation of transcription, DNA-templated///response to growth factor///response to organic substance///response to retinoic acid///retina morphogenesis in camera-type eye///sensory perception of sound///somatic stem cell population maintenance///stem cell differentiation///stem cell population maintenance///tongue development///transcription, DNA-templated cytoplasm///cytoplasm///cytosol///nuclear transcription factor complex///nucleoplasm///nucleus///nucleus///nucleus///transcription factor complex GO:0003677///GO:0003677///GO:0003677///GO:0000981///GO:0031490///GO:0003682///GO:0035198///GO:0005515///GO:0046982///GO:0043565///GO:0043565///GO:0003700///GO:0003700///GO:0008134///GO:0044212///GO:0044212///GO:0000976///contributes_to GO:0001077 GO:0021984///GO:0048646///GO:0007050///GO:0045165///GO:0001708///GO:0021987///GO:0050973///GO:0050910///GO:0048852///GO:0048568///GO:0001714///GO:0060441///GO:0021879///GO:0042472///GO:0060235///GO:0048286///GO:0030539///GO:0007275///GO:0030178///GO:0090090///GO:0090090///GO:0045596///GO:0050680///GO:0045665///GO:0045668///GO:0000122///GO:0048663///GO:0097150///GO:0030910///GO:0001649///GO:0046148///GO:0043410///GO:0045747///GO:0045597///GO:0022409///GO:0030858///GO:0002052///GO:0045666///GO:0045944///GO:0045944///GO:0045944///GO:0045944///GO:0045893///GO:0043281///GO:0010468///GO:0050767///GO:0006357///GO:0006355///GO:0006355///GO:0070848///GO:0010033///GO:0032526///GO:0060042///GO:0007605///GO:0035019///GO:0048863///GO:0019827///GO:0043586///GO:0006351 GO:0005737///GO:0005737///GO:0005829///GO:0044798///GO:0005654///GO:0005634///GO:0005634///GO:0005634///GO:0005667
我想看看数据集中是否有任何关键字,如果是,那么我想查看包含这些关键字的行/或计算它返回的结果数。我还想删除包含#,^或!的行。 这是我的代码:
keyword_array = []
with open('filename.text') as my_keywordfile:
keyword_array = my_keywordfile.readlines()
print(keyword_array)
file = open("GDS1001_exact.txt", "r")
for line in file.readlines():
if line.startswith(('#', '!', '^')):
continue;
else:
for keywords in keyword_array:
if keywords in line:
print(line)
else:
print(False)
我得到的输出都是假的!我错过了什么吗?
答案 0 :(得分:0)
keyword_array = []
with open('filename.txt') as my_keywordfile:
for keyword in my_keywordfile.readline().strip("[]").split( "," ):
keyword_array.append( keyword.strip().strip("'" ) )
count= 0;
file = open("GDS1001_exact.txt", "r")
for line in file.readlines():
if line.startswith(('#', '!', '^')):
continue;
else:
found = False
for keywords in keyword_array:
#print(keywords)
if line.find( keywords ) >= 0:
found = True
count += 1;
print(found, keywords, line)
break
print( "count=", count )
它还给我返回的行数!