如何在<a href="..."> only once using Perl

时间:2015-05-07 06:45:35

标签: regex perl bioinformatics bioperl

I tried the below code

#!usr/local/bin/perl
open(f1, "/home/httpd/cgi-bin/LDU/list1.txt");
while ( $line = <f1> ) {
    $line =~ m/(?:=")\w+/g;
    print "$line";
}

I need the output to be displayed as follows

acinetobacter_baumannii_26016_2      
acinetobacter_baumannii_44839_10
acinetobacter_baumannii_45002_9       
acinetobacter_baumannii_45075_6
__DATA__
<A HREF="acinetobacter_baumannii_26016_2/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="acinetobacter_baumannii_26016_2/">acinetobacter_baumannii_26016_2</A>. Mar 16 18:12        
<A HREF="acinetobacter_baumannii_44839_10/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="acinetobacter_baumannii_44839_10/">acinetobacter_baumannii_44839_1&gt;</A> Mar 16 18:12        
<A HREF="acinetobacter_baumannii_45002_9/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="acinetobacter_baumannii_45002_9/">acinetobacter_baumannii_45002_9</A>. Mar 16 18:11        
<A HREF="acinetobacter_baumannii_45075_6/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="acinetobacter_baumannii_45075_6/">acinetobacter_baumannii_45075_6</A>. Mar 16 18:13        
<A HREF="acinetobacter_baumannii_796380_1375/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="acinetobacter_baumannii_796380_1375/">acinetobacter_baumannii_796380_&gt;</A> Mar 16 18:13        
<A HREF="amycolatopsis_mediterranei_gca_000700945_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="amycolatopsis_mediterranei_gca_000700945_1/">amycolatopsis_mediterranei_gca_&gt;</A> Mar 16 18:11        
<A HREF="bacillus_subtilis_e1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="bacillus_subtilis_e1/">bacillus_subtilis_e1</A> . . . . . . Mar 16 18:13        
<A HREF="bdellovibrio_bacteriovorus/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="bdellovibrio_bacteriovorus/">bdellovibrio_bacteriovorus</A> . . . Mar 16 18:11        
<A HREF="bifidobacterium_adolescentis/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="bifidobacterium_adolescentis/">bifidobacterium_adolescentis</A> . . Mar 16 18:12        
<A HREF="bifidobacterium_breve_31l/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="bifidobacterium_breve_31l/">bifidobacterium_breve_31l</A>. . . . Mar 16 18:13        
<A HREF="bordetella_bronchiseptica_00_p_2796/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="bordetella_bronchiseptica_00_p_2796/">bordetella_bronchiseptica_00_p_&gt;</A> Mar 16 18:12        
<A HREF="bordetella_bronchiseptica_980_2/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="bordetella_bronchiseptica_980_2/">bordetella_bronchiseptica_980_2</A>. Mar 16 18:12        
<A HREF="bordetella_bronchiseptica_d993/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="bordetella_bronchiseptica_d993/">bordetella_bronchiseptica_d993</A> . Mar 16 18:13        
<A HREF="bordetella_bronchiseptica_mbord665/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="bordetella_bronchiseptica_mbord665/">bordetella_bronchiseptica_mbord&gt;</A> Mar 16 18:11        
<A HREF="bordetella_bronchiseptica_mbord782/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="bordetella_bronchiseptica_mbord782/">bordetella_bronchiseptica_mbord&gt;</A> Mar 16 18:13        
<A HREF="borrelia_garinii_sz/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="borrelia_garinii_sz/">borrelia_garinii_sz</A>. . . . . . . Mar 16 18:12        
<A HREF="brucella_pinnipedialis/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="brucella_pinnipedialis/">brucella_pinnipedialis</A> . . . . . Mar 16 18:13        
<A HREF="burkholderia_sp_mp_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="burkholderia_sp_mp_1/">burkholderia_sp_mp_1</A> . . . . . . Mar 16 18:11        
<A HREF="campylobacter_jejuni_10227/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="campylobacter_jejuni_10227/">campylobacter_jejuni_10227</A> . . . Mar 16 18:13        
<A HREF="campylobacter_jejuni_subsp_jejuni_81_176_drh212/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="campylobacter_jejuni_subsp_jejuni_81_176_drh212/">campylobacter_jejuni_subsp_jeju&gt;</A> Mar 16 18:13        
<A HREF="candidatus_caedibacter_acanthamoebae/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="candidatus_caedibacter_acanthamoebae/">candidatus_caedibacter_acantham&gt;</A> Mar 16 18:11        
<A HREF="clostridium_botulinum_d_str_16868/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="clostridium_botulinum_d_str_16868/">clostridium_botulinum_d_str_168&gt;</A> Mar 16 18:11        
<A HREF="criblamydia_sequanensis_crib_18/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="criblamydia_sequanensis_crib_18/">criblamydia_sequanensis_crib_18</A>. Mar 16 18:11        
<A HREF="enterococcus_faecalis_atcc_29212_gca_000742975_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="enterococcus_faecalis_atcc_29212_gca_000742975_1/">enterococcus_faecalis_atcc_2921&gt;</A> Mar 16 18:12        
<A HREF="enterococcus_faecalis_ga2/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="enterococcus_faecalis_ga2/">enterococcus_faecalis_ga2</A>. . . . Mar 16 18:11        
<A HREF="enterococcus_faecalis_gan13/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="enterococcus_faecalis_gan13/">enterococcus_faecalis_gan13</A>. . . Mar 16 18:12        
<A HREF="enterococcus_faecium_t110/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="enterococcus_faecium_t110/">enterococcus_faecium_t110</A>. . . . Mar 16 18:12        
<A HREF="enterococcus_faecium_uc7251/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="enterococcus_faecium_uc7251/">enterococcus_faecium_uc7251</A>. . . Mar 16 18:13        
<A HREF="enterococcus_faecium_uc8668/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="enterococcus_faecium_uc8668/">enterococcus_faecium_uc8668</A>. . . Mar 16 18:12        
<A HREF="enterococcus_faecium_vre1044/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="enterococcus_faecium_vre1044/">enterococcus_faecium_vre1044</A> . . Mar 16 18:12        
<A HREF="erythrobacter_litoralis/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="erythrobacter_litoralis/">erythrobacter_litoralis</A>. . . . . Mar 16 18:11        
<A HREF="escherichia_coli_1_110_08_s1_c1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_1_110_08_s1_c1/">escherichia_coli_1_110_08_s1_c1</A>. Mar 16 18:11        
<A HREF="escherichia_coli_2_052_05_s3_c1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_2_052_05_s3_c1/">escherichia_coli_2_052_05_s3_c1</A>. Mar 16 18:12        
<A HREF="escherichia_coli_2_177_06_s3_c2/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_2_177_06_s3_c2/">escherichia_coli_2_177_06_s3_c2</A>. Mar 16 18:12        
<A HREF="escherichia_coli_2_177_06_s4_c3/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_2_177_06_s4_c3/">escherichia_coli_2_177_06_s4_c3</A>. Mar 16 18:12        
<A HREF="escherichia_coli_2_222_05_s1_c2/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_2_222_05_s1_c2/">escherichia_coli_2_222_05_s1_c2</A>. Mar 16 18:13        
<A HREF="escherichia_coli_3_020_07_s4_c3/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_3_020_07_s4_c3/">escherichia_coli_3_020_07_s4_c3</A>. Mar 16 18:11        
<A HREF="escherichia_coli_3_073_06_s3_c2/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_3_073_06_s3_c2/">escherichia_coli_3_073_06_s3_c2</A>. Mar 16 18:11        
<A HREF="escherichia_coli_3_105_05_s3_c3/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_3_105_05_s3_c3/">escherichia_coli_3_105_05_s3_c3</A>. Mar 16 18:13        
<A HREF="escherichia_coli_6_537_08_s3_c2/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_6_537_08_s3_c2/">escherichia_coli_6_537_08_s3_c2</A>. Mar 16 18:12        
<A HREF="escherichia_coli_6_537_08_s3_c3/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_6_537_08_s3_c3/">escherichia_coli_6_537_08_s3_c3</A>. Mar 16 18:13        
<A HREF="escherichia_coli_8_415_05_s4_c1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_8_415_05_s4_c1/">escherichia_coli_8_415_05_s4_c1</A>. Mar 16 18:13        
<A HREF="escherichia_coli_bidmc_72/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_bidmc_72/">escherichia_coli_bidmc_72</A>. . . . Mar 16 18:12        
<A HREF="escherichia_coli_isc56/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_isc56/">escherichia_coli_isc56</A> . . . . . Mar 16 18:13        
<A HREF="escherichia_coli_o111_h8_str_f6627/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_o111_h8_str_f6627/">escherichia_coli_o111_h8_str_f6&gt;</A> Mar 16 18:12        
<A HREF="escherichia_coli_o121_h19_str_2011c_3108/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_o121_h19_str_2011c_3108/">escherichia_coli_o121_h19_str_2&gt;</A> Mar 16 18:11        
<A HREF="escherichia_coli_o157_h7_str_08_3527/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_o157_h7_str_08_3527/">escherichia_coli_o157_h7_str_08&gt;</A> Mar 16 18:13        
<A HREF="escherichia_coli_o157_h7_str_08_4529/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_o157_h7_str_08_4529/">escherichia_coli_o157_h7_str_08&gt;</A> Mar 16 18:12        
<A HREF="escherichia_coli_o157_h7_str_k4527/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_o157_h7_str_k4527/">escherichia_coli_o157_h7_str_k4&gt;</A> Mar 16 18:12        
<A HREF="escherichia_coli_o6_h16_str_f5656c1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_o6_h16_str_f5656c1/">escherichia_coli_o6_h16_str_f56&gt;</A> Mar 16 18:11        
<A HREF="escherichia_coli_str_st540_gca_000599685_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_str_st540_gca_000599685_1/">escherichia_coli_str_st540_gca_&gt;</A> Mar 16 18:11        
<A HREF="escherichia_coli_str_st540_gca_000599705_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_str_st540_gca_000599705_1/">escherichia_coli_str_st540_gca_&gt;</A> Mar 16 18:13        
<A HREF="escherichia_coli_uci_53/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="escherichia_coli_uci_53/">escherichia_coli_uci_53</A>. . . . . Mar 16 18:13        
<A HREF="flavobacterium_reichenbachii/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="flavobacterium_reichenbachii/">flavobacterium_reichenbachii</A> . . Mar 16 18:12        
<A HREF="gammaproteobacteria_bacterium_mfb021/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="gammaproteobacteria_bacterium_mfb021/">gammaproteobacteria_bacterium_m&gt;</A> Mar 16 18:12        
<A HREF="georgenia_sp_subg003/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="georgenia_sp_subg003/">georgenia_sp_subg003</A> . . . . . . Mar 16 18:13        
<A HREF="gilliamella_apicola_scgc_ab_598_i20/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="gilliamella_apicola_scgc_ab_598_i20/">gilliamella_apicola_scgc_ab_598&gt;</A> Mar 16 18:12        
<A HREF="haemophilus_parasuis_gca_000742795_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="haemophilus_parasuis_gca_000742795_1/">haemophilus_parasuis_gca_000742&gt;</A> Mar 16 18:12        
<A HREF="haemophilus_parasuis_hps9/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="haemophilus_parasuis_hps9/">haemophilus_parasuis_hps9</A>. . . . Mar 16 18:11        
<A HREF="halobacillus_karajensis/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="halobacillus_karajensis/">halobacillus_karajensis</A>. . . . . Mar 16 18:12        
<A HREF="halostagnicola_sp_a56/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="halostagnicola_sp_a56/">halostagnicola_sp_a56</A>. . . . . . Mar 16 18:11        
<A HREF="hyphomonas_jannaschiana_vp2/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="hyphomonas_jannaschiana_vp2/">hyphomonas_jannaschiana_vp2</A>. . . Mar 16 18:12        
<A HREF="hyphomonas_sp_25b14_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="hyphomonas_sp_25b14_1/">hyphomonas_sp_25b14_1</A>. . . . . . Mar 16 18:11        
<A HREF="klebsiella_pneumoniae_chs_43/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="klebsiella_pneumoniae_chs_43/">klebsiella_pneumoniae_chs_43</A> . . Mar 16 18:13        
<A HREF="klebsiella_pneumoniae_chs_49/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="klebsiella_pneumoniae_chs_49/">klebsiella_pneumoniae_chs_49</A> . . Mar 16 18:13        
<A HREF="lactobacillus_oryzae_jcm_18671/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="lactobacillus_oryzae_jcm_18671/">lactobacillus_oryzae_jcm_18671</A> . Mar 16 18:11        
<A HREF="listeria_monocytogenes_fsl_f6_684_gca_000525815_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="listeria_monocytogenes_fsl_f6_684_gca_000525815_1/">listeria_monocytogenes_fsl_f6_6&gt;</A> Mar 16 18:12        
<A HREF="listeria_monocytogenes_gca_000726305_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="listeria_monocytogenes_gca_000726305_1/">listeria_monocytogenes_gca_0007&gt;</A> Mar 16 18:11        
<A HREF="listeria_monocytogenes_gca_000726325_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="listeria_monocytogenes_gca_000726325_1/">listeria_monocytogenes_gca_0007&gt;</A> Mar 16 18:12        
<A HREF="listeria_monocytogenes_gca_000726695_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="listeria_monocytogenes_gca_000726695_1/">listeria_monocytogenes_gca_0007&gt;</A> Mar 16 18:11        
<A HREF="listeria_monocytogenes_gca_000727065_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="listeria_monocytogenes_gca_000727065_1/">listeria_monocytogenes_gca_0007&gt;</A> Mar 16 18:11        
<A HREF="listeria_monocytogenes_gca_000727735_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="listeria_monocytogenes_gca_000727735_1/">listeria_monocytogenes_gca_0007&gt;</A> Mar 16 18:12        
<A HREF="listeria_monocytogenes_gca_000728125_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="listeria_monocytogenes_gca_000728125_1/">listeria_monocytogenes_gca_0007&gt;</A> Mar 16 18:12        
<A HREF="listeria_monocytogenes_gca_000728365_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="listeria_monocytogenes_gca_000728365_1/">listeria_monocytogenes_gca_0007&gt;</A> Mar 16 18:12        
<A HREF="listeria_monocytogenes_gca_000728805_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="listeria_monocytogenes_gca_000728805_1/">listeria_monocytogenes_gca_0007&gt;</A> Mar 16 18:13        
<A HREF="listeria_monocytogenes_gca_000728845_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="listeria_monocytogenes_gca_000728845_1/">listeria_monocytogenes_gca_0007&gt;</A> Mar 16 18:13        
<A HREF="listeria_monocytogenes_lm_1880/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="listeria_monocytogenes_lm_1880/">listeria_monocytogenes_lm_1880</A> . Mar 16 18:12        
<A HREF="listeria_monocytogenes_wslc1042/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="listeria_monocytogenes_wslc1042/">listeria_monocytogenes_wslc1042</A>. Mar 16 18:13        
<A HREF="listeria_riparia_fsl_s10_1204/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="listeria_riparia_fsl_s10_1204/">listeria_riparia_fsl_s10_1204</A>. . Mar 16 18:11        
<A HREF="morganella_sp_egd_hp17/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="morganella_sp_egd_hp17/">morganella_sp_egd_hp17</A> . . . . . Mar 16 18:11        
<A HREF="mycobacterium_africanum_gca_000666065_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_africanum_gca_000666065_1/">mycobacterium_africanum_gca_000&gt;</A> Mar 16 18:11        
<A HREF="mycobacterium_africanum_mal010074/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_africanum_mal010074/">mycobacterium_africanum_mal0100&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_africanum_mal010081/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_africanum_mal010081/">mycobacterium_africanum_mal0100&gt;</A> Mar 16 18:12        
<A HREF="mycobacterium_tuberculosis_btb03_108/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_btb03_108/">mycobacterium_tuberculosis_btb0&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_btb04_416/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_btb04_416/">mycobacterium_tuberculosis_btb0&gt;</A> Mar 16 18:12        
<A HREF="mycobacterium_tuberculosis_btb05_285/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_btb05_285/">mycobacterium_tuberculosis_btb0&gt;</A> Mar 16 18:11        
<A HREF="mycobacterium_tuberculosis_btb07_323/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_btb07_323/">mycobacterium_tuberculosis_btb0&gt;</A> Mar 16 18:12        
<A HREF="mycobacterium_tuberculosis_btb08_022/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_btb08_022/">mycobacterium_tuberculosis_btb0&gt;</A> Mar 16 18:11        
<A HREF="mycobacterium_tuberculosis_btb08_309/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_btb08_309/">mycobacterium_tuberculosis_btb0&gt;</A> Mar 16 18:12        
<A HREF="mycobacterium_tuberculosis_btb10_357/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_btb10_357/">mycobacterium_tuberculosis_btb1&gt;</A> Mar 16 18:11        
<A HREF="mycobacterium_tuberculosis_btb11_027/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_btb11_027/">mycobacterium_tuberculosis_btb1&gt;</A> Mar 16 18:11        
<A HREF="mycobacterium_tuberculosis_btb11_207/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_btb11_207/">mycobacterium_tuberculosis_btb1&gt;</A> Mar 16 18:11        
<A HREF="mycobacterium_tuberculosis_btb12_001/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_btb12_001/">mycobacterium_tuberculosis_btb1&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_btb12_046/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_btb12_046/">mycobacterium_tuberculosis_btb1&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_gca_000736075_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_gca_000736075_1/">mycobacterium_tuberculosis_gca_&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_h2438/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_h2438/">mycobacterium_tuberculosis_h243&gt;</A> Mar 16 18:11        
<A HREF="mycobacterium_tuberculosis_h2581/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_h2581/">mycobacterium_tuberculosis_h258&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_h3005/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_h3005/">mycobacterium_tuberculosis_h300&gt;</A> Mar 16 18:11        
<A HREF="mycobacterium_tuberculosis_kt_0043/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_kt_0043/">mycobacterium_tuberculosis_kt_0&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_kt_0084/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_kt_0084/">mycobacterium_tuberculosis_kt_0&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_kzn_1435_gca_000669675_1/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_kzn_1435_gca_000669675_1/">mycobacterium_tuberculosis_kzn_&gt;</A> Mar 16 18:12        
<A HREF="mycobacterium_tuberculosis_m1236/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_m1236/">mycobacterium_tuberculosis_m123&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_m1274/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_m1274/">mycobacterium_tuberculosis_m127&gt;</A> Mar 16 18:11        
<A HREF="mycobacterium_tuberculosis_m1461/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_m1461/">mycobacterium_tuberculosis_m146&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_m1475/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_m1475/">mycobacterium_tuberculosis_m147&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_m1848/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_m1848/">mycobacterium_tuberculosis_m184&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_m1893/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_m1893/">mycobacterium_tuberculosis_m189&gt;</A> Mar 16 18:12        
<A HREF="mycobacterium_tuberculosis_m2086/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_m2086/">mycobacterium_tuberculosis_m208&gt;</A> Mar 16 18:11        
<A HREF="mycobacterium_tuberculosis_m2116/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_m2116/">mycobacterium_tuberculosis_m211&gt;</A> Mar 16 18:11        
<A HREF="mycobacterium_tuberculosis_m2193/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_m2193/">mycobacterium_tuberculosis_m219&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_m2211/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_m2211/">mycobacterium_tuberculosis_m221&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_m2435/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_m2435/">mycobacterium_tuberculosis_m243&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_mal010078/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_mal010078/">mycobacterium_tuberculosis_mal0&gt;</A> Mar 16 18:12        
<A HREF="mycobacterium_tuberculosis_mal020120/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_mal020120/">mycobacterium_tuberculosis_mal0&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_mal020150/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_mal020150/">mycobacterium_tuberculosis_mal0&gt;</A> Mar 16 18:12        
<A HREF="mycobacterium_tuberculosis_md14844/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_md14844/">mycobacterium_tuberculosis_md14&gt;</A> Mar 16 18:12        
<A HREF="mycobacterium_tuberculosis_md14847/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_md14847/">mycobacterium_tuberculosis_md14&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_md17647/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_md17647/">mycobacterium_tuberculosis_md17&gt;</A> Mar 16 18:12        
<A HREF="mycobacterium_tuberculosis_md17902/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_md17902/">mycobacterium_tuberculosis_md17&gt;</A> Mar 16 18:12        
<A HREF="mycobacterium_tuberculosis_md17973/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_md17973/">mycobacterium_tuberculosis_md17&gt;</A> Mar 16 18:12        
<A HREF="mycobacterium_tuberculosis_nritld54/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_nritld54/">mycobacterium_tuberculosis_nrit&gt;</A> Mar 16 18:12        
<A HREF="mycobacterium_tuberculosis_ofxr_11/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_ofxr_11/">mycobacterium_tuberculosis_ofxr&gt;</A> Mar 16 18:13        
<A HREF="mycobacterium_tuberculosis_ofxr_15/"><IMG border="0" SRC="/squid-internal-static/icons/anthony-dir.gif" ALT="[DIR] "></A> <A HREF="mycobacterium_tuberculosis_ofxr_15/">mycobacterium_tuberculosis_ofxr&gt;</A> Mar 16 18:13        

4 个答案:

答案 0 :(得分:3)

只要href属性是每个<a>标记中的第一个属性,此程序就会按您的要求执行。它还检查以前是否看过每个名称,只有在新名称时才打印出来。

use strict;
use warnings;
use 5.010;

my %seen;
while ( <DATA> ) {
  while ( m{<a\s+href="([^"]*)/"}ig ) {
    say $1 unless $seen{$1}++;
  }
}

<强>输出

acinetobacter_baumannii_26016_2
acinetobacter_baumannii_44839_10
acinetobacter_baumannii_45002_9
acinetobacter_baumannii_45075_6
acinetobacter_baumannii_796380_1375
amycolatopsis_mediterranei_gca_000700945_1
bacillus_subtilis_e1
bdellovibrio_bacteriovorus
bifidobacterium_adolescentis
bifidobacterium_breve_31l
bordetella_bronchiseptica_00_p_2796
bordetella_bronchiseptica_980_2
bordetella_bronchiseptica_d993
bordetella_bronchiseptica_mbord665
bordetella_bronchiseptica_mbord782
borrelia_garinii_sz
brucella_pinnipedialis
burkholderia_sp_mp_1
campylobacter_jejuni_10227
campylobacter_jejuni_subsp_jejuni_81_176_drh212
candidatus_caedibacter_acanthamoebae
clostridium_botulinum_d_str_16868
criblamydia_sequanensis_crib_18
enterococcus_faecalis_atcc_29212_gca_000742975_1
enterococcus_faecalis_ga2
enterococcus_faecalis_gan13
enterococcus_faecium_t110
enterococcus_faecium_uc7251
enterococcus_faecium_uc8668
enterococcus_faecium_vre1044
erythrobacter_litoralis
escherichia_coli_1_110_08_s1_c1
escherichia_coli_2_052_05_s3_c1
escherichia_coli_2_177_06_s3_c2
escherichia_coli_2_177_06_s4_c3
escherichia_coli_2_222_05_s1_c2
escherichia_coli_3_020_07_s4_c3
escherichia_coli_3_073_06_s3_c2
escherichia_coli_3_105_05_s3_c3
escherichia_coli_6_537_08_s3_c2
escherichia_coli_6_537_08_s3_c3
escherichia_coli_8_415_05_s4_c1
escherichia_coli_bidmc_72
escherichia_coli_isc56
escherichia_coli_o111_h8_str_f6627
escherichia_coli_o121_h19_str_2011c_3108
escherichia_coli_o157_h7_str_08_3527
escherichia_coli_o157_h7_str_08_4529
escherichia_coli_o157_h7_str_k4527
escherichia_coli_o6_h16_str_f5656c1
escherichia_coli_str_st540_gca_000599685_1
escherichia_coli_str_st540_gca_000599705_1
escherichia_coli_uci_53
flavobacterium_reichenbachii
gammaproteobacteria_bacterium_mfb021
georgenia_sp_subg003
gilliamella_apicola_scgc_ab_598_i20
haemophilus_parasuis_gca_000742795_1
haemophilus_parasuis_hps9
halobacillus_karajensis
halostagnicola_sp_a56
hyphomonas_jannaschiana_vp2
hyphomonas_sp_25b14_1
klebsiella_pneumoniae_chs_43
klebsiella_pneumoniae_chs_49
lactobacillus_oryzae_jcm_18671
listeria_monocytogenes_fsl_f6_684_gca_000525815_1
listeria_monocytogenes_gca_000726305_1
listeria_monocytogenes_gca_000726325_1
listeria_monocytogenes_gca_000726695_1
listeria_monocytogenes_gca_000727065_1
listeria_monocytogenes_gca_000727735_1
listeria_monocytogenes_gca_000728125_1
listeria_monocytogenes_gca_000728365_1
listeria_monocytogenes_gca_000728805_1
listeria_monocytogenes_gca_000728845_1
listeria_monocytogenes_lm_1880
listeria_monocytogenes_wslc1042
listeria_riparia_fsl_s10_1204
morganella_sp_egd_hp17
mycobacterium_africanum_gca_000666065_1
mycobacterium_africanum_mal010074
mycobacterium_africanum_mal010081
mycobacterium_tuberculosis_btb03_108
mycobacterium_tuberculosis_btb04_416
mycobacterium_tuberculosis_btb05_285
mycobacterium_tuberculosis_btb07_323
mycobacterium_tuberculosis_btb08_022
mycobacterium_tuberculosis_btb08_309
mycobacterium_tuberculosis_btb10_357
mycobacterium_tuberculosis_btb11_027
mycobacterium_tuberculosis_btb11_207
mycobacterium_tuberculosis_btb12_001
mycobacterium_tuberculosis_btb12_046
mycobacterium_tuberculosis_gca_000736075_1
mycobacterium_tuberculosis_h2438
mycobacterium_tuberculosis_h2581
mycobacterium_tuberculosis_h3005
mycobacterium_tuberculosis_kt_0043
mycobacterium_tuberculosis_kt_0084
mycobacterium_tuberculosis_kzn_1435_gca_000669675_1
mycobacterium_tuberculosis_m1236
mycobacterium_tuberculosis_m1274
mycobacterium_tuberculosis_m1461
mycobacterium_tuberculosis_m1475
mycobacterium_tuberculosis_m1848
mycobacterium_tuberculosis_m1893
mycobacterium_tuberculosis_m2086
mycobacterium_tuberculosis_m2116
mycobacterium_tuberculosis_m2193
mycobacterium_tuberculosis_m2211
mycobacterium_tuberculosis_m2435
mycobacterium_tuberculosis_mal010078
mycobacterium_tuberculosis_mal020120
mycobacterium_tuberculosis_mal020150
mycobacterium_tuberculosis_md14844
mycobacterium_tuberculosis_md14847
mycobacterium_tuberculosis_md17647
mycobacterium_tuberculosis_md17902
mycobacterium_tuberculosis_md17973
mycobacterium_tuberculosis_nritld54
mycobacterium_tuberculosis_ofxr_11
mycobacterium_tuberculosis_ofxr_15

答案 1 :(得分:2)

我建议首先使用模块,因为HTML无法正确解析正则表达式。它可能会起作用,但容易出现脆弱的代码。

因此,这样的事情:(感谢:http://www.perlmonks.org/?node_id=557357

use strict;
use warnings;

use WWW::Mechanize;

my $mech  = WWW::Mechanize->new();

$mech->get( 'file://C:/path/to/your_html/file.html' );

my @links = $mech->links();

foreach my $link (@links) {
    my $url = $link -> url;
    $url =~ s,/$,,g; 
    print $url,"\n";
}

对于您的简单数据集,这应该可以解决问题:

local $/;
my @links = <DATA> =~ m,<A HREF=\"(.*?)/?\">,g;
print join ( "\n", @links );

答案 2 :(得分:0)

使用您的数据尝试以下代码我得到的结果如下:

acinetobacter_baumannii_26016_2
acinetobacter_baumannii_44839_1
acinetobacter_baumannii_45002_9
...

代码是:

open(f1,"/home/httpd/cgi-bin/LDU/list1.txt");
while($line=<f1>){
    $line=~/([0-9A-Za-z_]*)(\s*)[\.>].*/;
    print $1 . "\n";
}

答案 3 :(得分:-1)

#!/usr/bin/perl
use strict;
use warnings;
open(f1,"/home/httpd/cgi-bin/LDU/list1.txt")||die("error");

while(my $line =<f1> )
{
my ($match) = ($line =~ m/(?:=")(\w+)/g);
print "$match\n";
}