检查给定文件中是否有任何指定的关键字

时间:2017-09-12 21:08:35

标签: python

我有一个关键字列表;你可以在这里看到它的一个子集:

'Ablation', 'Advance Directive (living will)', 'Aerobic Exercise', 
'Ambulatory EKG Monitors', 'Anemia', 'Aneurysm', 'Angina (also called angina 
pectoris)', 'Angiogenesis', 'Angioplasty', 'Angiotensin-Converting Enzyme 
Inhibitors (ACE inhibitors)', 'Angiotensin II Receptor Blockers (ARBs)', 
'Annulus',

我还有一个包含一长串数据的数据集,它的子集也在这里:

!Annotation_platform_organism = Mus musculus
^DATASET = GDS1001
#ID_REF = Platform reference identifier
ID_REF  IDENTIFIER  GSM19023    GSM19024    GSM19025    GSM19026    Gene title  Gene symbol Gene ID UniGene title   UniGene symbol  UniGene ID  Nucleotide Title    GI  GenBank Accession   Platform_CLONEID    Platform_ORF    Platform_SPOTID Chromosome location Chromosome annotation   GO:Function GO:Process  GO:Component    GO:Function ID  GO:Process ID   GO:Component ID
100001_at   Cd3g    7046.7  5672.4  743.4   592.9   CD3 antigen, gamma polypeptide  Cd3g    12502               Mouse CD3-gamma (T3-gamma) gene, exon 7 192496  M18228              9 24.84 cM  Chromosome 9, NC_000075.6 (44969572..44980431, complement)  protein binding///protein heterodimerization activity///transmembrane signaling receptor activity   cell surface receptor signaling pathway///establishment or maintenance of cell polarity///protein transport///regulation of lymphocyte apoptotic process    alpha-beta T cell receptor complex///integral component of membrane///membrane  GO:0005515///GO:0046982///GO:0004888    GO:0007166///GO:0007163///GO:0015031///GO:0070228   GO:0042105///GO:0016021///GO:0016020
100002_at   Itih3   104.3   169.3   170.2   80.3    inter-alpha trypsin inhibitor, heavy chain 3    Itih3   16426               M.musculus mRNA for inter-alpha-inhibitor H3 chain  695635  X70393              14 A2-C1    Chromosome 14, NC_000080.6 (30908572..30923755, complement) peptidase inhibitor activity///serine-type endopeptidase inhibitor activity hyaluronan metabolic process///negative regulation of peptidase activity    extracellular exosome///extracellular region    GO:0030414///GO:0004867 GO:0030212///GO:0010466 GO:0070062///GO:0005576
100003_at   Ryr1    404.6   328.8   309.6   398.1   ryanodine receptor 1, skeletal muscle   Ryr1    20190               Mus musculus RyR1 mRNA for skeletal muscle ryanodine receptor, partial cds  1030707 D38216              7 A2-B3 Chromosome 7, NC_000073.6 (29003340..29125179, complement)  calcium channel activity///calcium ion binding///calcium-release channel activity///calmodulin binding///enzyme binding///ion channel activity///protease binding///protein binding///ryanodine-sensitive calcium-release channel activity///ryanodine-sensitive calcium-release channel activity///voltage-gated calcium channel activity  calcium ion transmembrane transport///calcium ion transport///calcium ion transport///cellular calcium ion homeostasis///cellular response to caffeine///ion transport///multicellular organism development///muscle contraction///muscle contraction///ossification involved in bone maturation///outflow tract morphogenesis///regulation of cytosolic calcium ion concentration///regulation of muscle contraction///release of sequestered calcium ion into cytosol///release of sequestered calcium ion into cytosol///release of sequestered calcium ion into cytosol by sarcoplasmic reticulum///response to caffeine///response to hypoxia///sarcoplasmic reticulum calcium ion transport///skeletal muscle fiber development///skin development///transmembrane transport///transport  I band///T-tubule///cell cortex///cytoplasm///extracellular exosome///extrinsic component of cytoplasmic side of plasma membrane///integral component of membrane///junctional membrane complex///membrane///plasma membrane///protein complex///sarcolemma///sarcoplasmic reticulum///sarcoplasmic reticulum///sarcoplasmic reticulum membrane///sarcoplasmic reticulum membrane///smooth endoplasmic reticulum    GO:0005262///GO:0005509///GO:0015278///GO:0005516///GO:0019899///GO:0005216///GO:0002020///GO:0005515///GO:0005219///GO:0005219///GO:0005245    GO:0070588///GO:0006816///GO:0006816///GO:0006874///GO:0071313///GO:0006811///GO:0007275///GO:0006936///GO:0006936///GO:0043931///GO:0003151///GO:0051480///GO:0006937///GO:0051209///GO:0051209///GO:0014808///GO:0031000///GO:0001666///GO:0070296///GO:0048741///GO:0043588///GO:0055085///GO:0006810    GO:0031674///GO:0030315///GO:0005938///GO:0005737///GO:0070062///GO:0031234///GO:0016021///GO:0030314///GO:0016020///GO:0005886///GO:0043234///GO:0042383///GO:0016529///GO:0016529///GO:0033017///GO:0033017///GO:0005790
100004_at   Ints7   823.2   850 407.5   431.3   integrator complex subunit 7    Ints7   77065               UI-M-BH2.3-aob-a-12-0-UI.s1 NIH_BMAP_M_S3.3 Mus musculus cDNA clone UI-M-BH2.3-aob-a-12-0-UI 3-, mRNA sequence  6096223 AW120890                1 H6    Chromosome 1, NC_000067.6 (191575636..191623690)    molecular_function  DNA damage checkpoint///cellular response to DNA damage stimulus///cellular response to ionizing radiation///snRNA processing   chromosome///integrator complex///nucleus   GO:0003674  GO:0000077///GO:0006974///GO:0071479///GO:0016180   GO:0005694///GO:0032039///GO:0005634
100005_at   Traf4   1460.6  1377.4  879.3   803 TNF receptor associated factor 4    Traf4   22032               M.musculus mRNA for CART1 protein   1041445 X92346              11 B5-C Chromosome 11, NC_000077.6 (78158423..78165569, complement) WW domain binding///identical protein binding///metal ion binding///protein kinase binding///thioesterase binding///tumor necrosis factor receptor binding///ubiquitin protein ligase binding///ubiquitin-protein transferase activity///zinc ion binding   apoptotic process///multicellular organism development///positive regulation of JNK cascade///positive regulation of protein homodimerization activity///positive regulation of protein kinase activity///protein ubiquitination///regulation of apoptotic process///respiratory gaseous exchange///respiratory tube development///signal transduction  bicellular tight junction///cell junction///cytoplasm///cytoskeleton///membrane///nucleus///plasma membrane GO:0050699///GO:0042802///GO:0046872///GO:0019901///GO:0031996///GO:0005164///GO:0031625///GO:0004842///GO:0008270  GO:0006915///GO:0007275///GO:0046330///GO:0090073///GO:0045860///GO:0016567///GO:0042981///GO:0007585///GO:0030323///GO:0007165 GO:0005923///GO:0030054///GO:0005737///GO:0005856///GO:0016020///GO:0005634///GO:0005886
100006_at   Cdh11   164.7   98  26.6    131 cadherin 11 Cdh11   12552               Mus musculus osf-4 mRNA for OB-cadherin-1, complete cds 994774  D21253              8 50.44 cM  Chromosome 8, NC_000074.6 (102632095..102785983, complement)    calcium ion binding///metal ion binding cell adhesion///corticospinal tract morphogenesis///homophilic cell adhesion via plasma membrane adhesion molecules cytoplasm///extracellular exosome///integral component of membrane///membrane///plasma membrane GO:0005509///GO:0046872 GO:0007155///GO:0021957///GO:0007156    GO:0005737///GO:0070062///GO:0016021///GO:0016020///GO:0005886
100007_at   Irf2bp1 2418.3  2669.4  2749.8  2717.8  interferon regulatory factor 2 binding protein 1    Irf2bp1 272359              UI-M-AL0-abs-g-06-0-UI.s1 NIH_BMAP_MCO Mus musculus cDNA clone UI-M-AL0-abs-g-06-0-UI 3-, mRNA sequence 5471786 AI837573                7 A3    Chromosome 7, NC_000073.6 (19004065..19006763)  UDP-N-acetylmuramoylalanyl-D-glutamyl-2,6-diaminopimelate-D-alanyl-D-alanine ligase activity///coenzyme F420-0 gamma-glutamyl ligase activity///coenzyme F420-2 alpha-glutamyl ligase activity///ligase activity///metal ion binding///protein-glutamic acid ligase activity///protein-glycine ligase activity///protein-glycine ligase activity, elongating///protein-glycine ligase activity, initiating///ribosomal S6-glutamic acid ligase activity///transcription corepressor activity///transcription factor binding///tubulin-glutamic acid ligase activity///tubulin-glycine ligase activity   negative regulation of transcription from RNA polymerase II promoter///regulation of transcription, DNA-templated///transcription, DNA-templated    nucleoplasm///nucleus   GO:0008766///GO:0043773///GO:0043774///GO:0016874///GO:0046872///GO:0070739///GO:0070735///GO:0070737///GO:0070736///GO:0018169///GO:0003714///GO:0008134///GO:0070740///GO:0070738 GO:0000122///GO:0006355///GO:0006351    GO:0005654///GO:0005634
100009_r_at Sox2    475.6   457.7   451.8   479.7   SRY (sex determining region Y)-box 2    Sox2    20674               M.musculus SOX2 gene    1209429 X94127              3 A2-B  Chromosome 3, NC_000069.6 (34649995..34652461)  DNA binding///DNA binding///DNA binding///RNA polymerase II transcription factor activity, sequence-specific DNA binding///chromatin DNA binding///chromatin binding///miRNA binding///protein binding///protein heterodimerization activity///sequence-specific DNA binding///sequence-specific DNA binding///transcription factor activity, sequence-specific DNA binding///transcription factor activity, sequence-specific DNA binding///transcription factor binding///transcription regulatory region DNA binding///transcription regulatory region DNA binding///transcription regulatory region sequence-specific DNA binding///contributes_to transcriptional activator activity, RNA polymerase II core promoter proximal region sequence-specific binding    adenohypophysis development///anatomical structure formation involved in morphogenesis///cell cycle arrest///cell fate commitment///cell fate specification///cerebral cortex development///detection of mechanical stimulus involved in equilibrioception///detection of mechanical stimulus involved in sensory perception of sound///diencephalon morphogenesis///embryonic organ development///endodermal cell fate specification///epithelial tube branching involved in lung morphogenesis///forebrain neuron differentiation///inner ear morphogenesis///lens induction in camera-type eye///lung alveolus development///male genitalia development///multicellular organism development///negative regulation of Wnt signaling pathway///negative regulation of canonical Wnt signaling pathway///negative regulation of canonical Wnt signaling pathway///negative regulation of cell differentiation///negative regulation of epithelial cell proliferation///negative regulation of neuron differentiation///negative regulation of osteoblast differentiation///negative regulation of transcription from RNA polymerase II promoter///neuron fate commitment///neuronal stem cell population maintenance///olfactory placode formation///osteoblast differentiation///pigment biosynthetic process///positive regulation of MAPK cascade///positive regulation of Notch signaling pathway///positive regulation of cell differentiation///positive regulation of cell-cell adhesion///positive regulation of epithelial cell differentiation///positive regulation of neuroblast proliferation///positive regulation of neuron differentiation///positive regulation of transcription from RNA polymerase II promoter///positive regulation of transcription from RNA polymerase II promoter///positive regulation of transcription from RNA polymerase II promoter///positive regulation of transcription from RNA polymerase II promoter///positive regulation of transcription, DNA-templated///regulation of cysteine-type endopeptidase activity involved in apoptotic process///regulation of gene expression///regulation of neurogenesis///regulation of transcription from RNA polymerase II promoter///regulation of transcription, DNA-templated///regulation of transcription, DNA-templated///response to growth factor///response to organic substance///response to retinoic acid///retina morphogenesis in camera-type eye///sensory perception of sound///somatic stem cell population maintenance///stem cell differentiation///stem cell population maintenance///tongue development///transcription, DNA-templated cytoplasm///cytoplasm///cytosol///nuclear transcription factor complex///nucleoplasm///nucleus///nucleus///nucleus///transcription factor complex   GO:0003677///GO:0003677///GO:0003677///GO:0000981///GO:0031490///GO:0003682///GO:0035198///GO:0005515///GO:0046982///GO:0043565///GO:0043565///GO:0003700///GO:0003700///GO:0008134///GO:0044212///GO:0044212///GO:0000976///contributes_to GO:0001077  GO:0021984///GO:0048646///GO:0007050///GO:0045165///GO:0001708///GO:0021987///GO:0050973///GO:0050910///GO:0048852///GO:0048568///GO:0001714///GO:0060441///GO:0021879///GO:0042472///GO:0060235///GO:0048286///GO:0030539///GO:0007275///GO:0030178///GO:0090090///GO:0090090///GO:0045596///GO:0050680///GO:0045665///GO:0045668///GO:0000122///GO:0048663///GO:0097150///GO:0030910///GO:0001649///GO:0046148///GO:0043410///GO:0045747///GO:0045597///GO:0022409///GO:0030858///GO:0002052///GO:0045666///GO:0045944///GO:0045944///GO:0045944///GO:0045944///GO:0045893///GO:0043281///GO:0010468///GO:0050767///GO:0006357///GO:0006355///GO:0006355///GO:0070848///GO:0010033///GO:0032526///GO:0060042///GO:0007605///GO:0035019///GO:0048863///GO:0019827///GO:0043586///GO:0006351    GO:0005737///GO:0005737///GO:0005829///GO:0044798///GO:0005654///GO:0005634///GO:0005634///GO:0005634///GO:0005667

我想看看数据集中是否有任何关键字,如果是,那么我想查看包含这些关键字的行/或计算它返回的结果数。我还想删除包含#,^或!的行。 这是我的代码:

keyword_array = []
with open('filename.text') as my_keywordfile:
    keyword_array = my_keywordfile.readlines()   
    print(keyword_array)    
file = open("GDS1001_exact.txt", "r")
for line in file.readlines():
        if line.startswith(('#', '!', '^')):
            continue;
        else:
            for keywords in keyword_array:
               if keywords in line:
                    print(line)
            else:
                print(False)

我得到的输出都是假的!我错过了什么吗?

1 个答案:

答案 0 :(得分:0)

好的,伙计们。在朋友的帮助下,我得到了工作代码:

keyword_array = []
with open('filename.txt') as my_keywordfile:
    for keyword in my_keywordfile.readline().strip("[]").split( "," ):
        keyword_array.append( keyword.strip().strip("'" ) )   

count= 0;
file = open("GDS1001_exact.txt", "r")
for line in file.readlines():
        if line.startswith(('#', '!', '^')):
            continue;
        else:
            found = False
            for keywords in keyword_array:
                #print(keywords)
                if line.find( keywords ) >= 0:
                    found = True
                    count += 1;
                    print(found, keywords, line)
                    break


print( "count=", count )

它还给我返回的行数!