在Linux,bash

时间:2018-12-09 10:32:09

标签: linux bash shell parsing scripting

我试图在它们各自的列下打印2个变量(Phaster_positions和GBKPositions)。我希望将每个变量打印在由选项卡分隔的列下。这是我得到的:

Phaster_positions             GBKPositions  Phaster_positions  GBKPositions
371860-418565                 247..381
2947108-2988239               378..1781
4663633-4680174               1884..2987
5756724-5793879               3008..3103
5794433-5829445               3128..4405
6867447-6901202               4479..5081
5102..6229
6253..8670
complement(8742..9269)
complement(9583..10563)
complement(10560..12458)
complement(12455..13402)
complement(13973..15541)
complement(15881..16051)
16440..16814
complement(16858..18234)
complement(18254..18628)
complement(18710..20266)
complement(20317..22452)
complement(22888..23454)
complement(23474..25552)
complement(25557..26504)
26735..27631
complement(27655..29334)
29603..30559
complement(30534..31982)
complement(32016..33389)
complement(33391..34734)
complement(34736..35692)
complement(35761..36267)
36431..37459
37519..38688

我想要:

Phaster_positions   GBKPositions

371860-418565       247..381
2947108-2988239     378.1781
4663633-4680174     etc
5756724-5793879     etc
5794433-5829445     etc
6867447-6901202     etc

我的脚本:

#!/bin/bash

printf "Phaster_positions\n\n">gbk31.txt
printf "GBKPositions\n\n">gbk32.txt

PhasterPositions=`awk '$2~/[0-9]Kb/{print ($5)}' CP000155.phaster`
GBKPositions=`awk '$1~/CDS/{print ($2)}' CP000155.gbk`

echo -e "$PhasterPositions">>gbk31.txt
echo -e "$GBKPositions">>gbk32.txt

joined=`paste gbk31.txt gbk32.txt | column -s $'\t' -t`
echo -e "$joined">> gbkfinal.txt

第一个变量的源文件:

gi|00000000|ref|NC_000000|  Hahella chejuensis KCTC 2396, complete genome. .7215267, gc%: 53.87%
                                  REGION         REGION_LENGTH            COMPLETENESS(score)           SPECIFIC_KEYWORD                             REGION_POSITION          TRNA_NUM                 TOTAL_PROTEIN_NUM       PHAGE_HIT_PROTEIN_NUM            HYPOTHETICAL_PROTEIN_NUM         PHAGE+HYPO_PROTEIN_PERCENTAGE    BACTERIAL_PROTEIN_NUM            ATT_SITE_SHOWUP                  PHAGE_SPECIES_NUM                MOST_COMMON_PHAGE_NAME(hit_genes_count)    FIRST_MOST_COMMON_PHAGE_NUM      FIRST_MOST_COMMON_PHAGE_PERCENTAGE   GC_PERCENTAGE                 
                                 ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
                                  1              46.7Kb                   questionable(80)              head,terminase,tail,capsid,recombinase       371860-418565            0                        38                      27                               8                                92.1%                            3                                yes                              10                               PHAGE_Pseudo_phi3_NC_030940(17),PHAGE_Aeromo_phiO18P_NC_009542(15),PHAGE_Haemop_HP1_NC_001697(11),PHAGE_Pasteu_F108_NC_008193(9),PHAGE_Vibrio_8_NC_022747(8),PHAGE_Vibrio_K139_NC_003313(8),PHAGE_Haemop_HP2_NC_003315(7),PHAGE_Phormi_MIS_PhV1A_NC_029032(3),PHAGE_Ralsto_RSY1_NC_025115(3),PHAGE_Burkho_KS14_NC_015273(2),PHAGE_Entero_186_NC_001317(2),PHAGE_Entero_N15_NC_001901(1),PHAGE_Salmon_SEN1_NC_029003(1),PHAGE_Mannhe_vB_MhM_587AP1_NC_028898(1),PHAGE_Salmon_RE_2010_NC_019488(1),PHAGE_Vibrio_vB_VpaM_MAR_NC_019722(1),PHAGE_Klebsi_phiKO2_NC_005857(1),PHAGE_Burkho_KS5_NC_015265(1),PHAGE_Pseudo_YuA_NC_010116(1),PHAGE_Vibrio_VP882_NC_009016(1),PHAGE_Mannhe_phiMHaA1_NC_008201(1),PHAGE_Pseudo_MP1412_NC_018282(1),PHAGE_Stenot_Smp131_NC_023588(1),PHAGE_Pseudo_JG004_NC_019450(1),PHAGE_Bdello_phi1422_NC_019525(1),PHAGE_Salmon_Fels_2_NC_010463(1),PHAGE_Bacill_G_NC_023719(1),PHAGE_Pseudo_phiCTX_NC_003278(1),PHAGE_Psychr_Psymv2_NC_023734(1),PHAGE_Entero_fiAA91_ss_NC_022750(1),PHAGE_Escher_pro483_NC_028943(1),PHAGE_Burkho_KL3_NC_015266(1)   16                               44.73%                           55.35%                        
                                  2              41.1Kb                   intact(120)                   integrase,head,recombinase,capsid,tail       2947108-2988239          1                        53                      23                               28                               96.2%                            2                                yes                              18                               PHAGE_Pseudo_phi2_NC_030931(10),PHAGE_Entero_lambda_NC_001416(4),PHAGE_Pseudo_F10_NC_007805(4),PHAGE_Escher_vB_EcoM_ECO1230_10_NC_027995(3),PHAGE_Entero_N15_NC_001901(3),PHAGE_Burkho_AH2_NC_018283(3),PHAGE_Shewan_1/44_NC_025463(2),PHAGE_Achrom_phiAxp_2_NC_029106(2),PHAGE_Vibrio_VvAW1_NC_020488(2),PHAGE_Burkho_BcepNazgul_NC_005091(2),PHAGE_Entero_Arya_NC_031048(2),PHAGE_Entero_mEp460_NC_019716(2),PHAGE_Entero_HK630_NC_019723(2),PHAGE_Vibrio_X29_NC_024369(2),PHAGE_Escher_vB_EcoM_ep3_NC_025430(1),PHAGE_Salmon_phiSG_JL2_NC_010807(1),PHAGE_Rueger_DSS3_P1_NC_025428(1),PHAGE_Shigel_SfIV_NC_022749(1),PHAGE_Klebsi_phiKO2_NC_005857(1),PHAGE_Shigel_Ss_VASD_NC_028685(1),PHAGE_Entero_SfV_NC_003444(1),PHAGE_Marino_P12026_NC_018269(1),PHAGE_Entero_HK629_NC_019711(1),PHAGE_Entero_phi80_NC_021190(1),PHAGE_Entero_BP_4795_NC_004813(1),PHAGE_Burkho_BcepIL02_NC_012743(1),PHAGE_Vibrio_VP882_NC_009016(1),PHAGE_Entero_VT2phi_272_NC_028656(1),PHAGE_Phage_Gifsy_1_NC_010392(1),PHAGE_Bdello_phi1422_NC_019525(1),PHAGE_Vibrio_VpKK5_NC_026610(1),PHAGE_Pectob_ZF40_NC_019522(1),PHAGE_Ralsto_RS138_NC_029107(1),PHAGE_Entero_mEp237_NC_019704(1),PHAGE_Salico_CGphi29_NC_020844(1),PHAGE_Entero_HK225_NC_019717(1),PHAGE_Bacill_Slash_NC_022774(1),PHAGE_Rhodob_RcapNL_NC_020489(1),PHAGE_Pseudo_F116_NC_006552(1),PHAGE_Escher_80001_NC_027387(1),PHAGE_Salmon_FSLSP088_NC_021780(1),PHAGE_Pseudo_PPpW_3_NC_023006(1),PHAGE_Vibrio_vB_VpaM_MAR_NC_019722(1),PHAGE_Synech_S_CBS1_NC_016164(1),PHAGE_Burkho_KS14_NC_015273(1),PHAGE_Stenot_S1_NC_011589(1),PHAGE_Escher_TL_2011c_NC_019442(1),PHAGE_Entero_186_NC_001317(1),PHAGE_Entero_cdtI_NC_009514(1),PHAGE_Burkho_DC1_NC_018452(1),PHAGE_Bacter_Lily_NC_028841(1),PHAGE_Burkho_BcepMigl_NC_019917(1),PHAGE_Salmon_iEPS5_NC_021783(1),PHAGE_Erwini_vB_EamP_L1_NC_019510(1),PHAGE_Escher_P13374_NC_018846(1),PHAGE_Vibrio_SIO_2_NC_016567(1)   4                                18.86%                           53.18%                        
                                  3              16.5Kb                   intact(110)                   tail,head,capsid,terminase                   4663633-4680174          0                        17                      12                               5                                100%                             0                                no                               10                               PHAGE_Salmon_ST64B_NC_004313(3),PHAGE_Entero_phiP27_NC_003356(3),PHAGE_Burkho_phi6442_NC_009235(3),PHAGE_Burkho_phiE125_NC_003309(3),PHAGE_Burkho_phi1026b_NC_005284(3),PHAGE_Entero_SfV_NC_003444(2),PHAGE_Entero_HK140_NC_019710(2),PHAGE_Salmon_118970_sal3_NC_031940(2),PHAGE_Strept_phiSASD1_NC_014229(2),PHAGE_Salmon_118970_sal3_NC_031940(2),PHAGE_Idioma_1N2_2_NC_025439(1),PHAGE_Shigel_SfIV_NC_022749(1),PHAGE_Entero_mEp235_NC_019708(1),PHAGE_Mannhe_vB_MhS_1152AP2_NC_028956(1),PHAGE_Vibrio_12B8_NC_021073(1),PHAGE_Mycoba_Lockley_NC_011021(1),PHAGE_Entero_HK022_NC_002166(1),PHAGE_Entero_mEp390_NC_019721(1),PHAGE_Entero_BP_4795_NC_004813(1),PHAGE_Marino_P12026_NC_018269(1),PHAGE_Colwel_9A_NC_018088(1),PHAGE_Vibrio_VpKK5_NC_026610(1),PHAGE_Clostr_phiCD6356_NC_015262(1),PHAGE_Entero_HK542_NC_019769(1),PHAGE_Entero_IME_EFm5_NC_028826(1),PHAGE_Geobac_E2_NC_009552(1),PHAGE_Entero_IME_EFm1_NC_024356(1),PHAGE_Burkho_KS9_NC_013055(1),PHAGE_Pseudo_Pq0_NC_029100(1),PHAGE_Rhizob_vB_RleS_L338C_NC_023502(1),PHAGE_Entero_SfI_NC_027339(1),PHAGE_Geobac_GBK2_NC_023612(1),PHAGE_Shigel_SfII_NC_021857(1),PHAGE_Rhodoc_REQ1_NC_016655(1),PHAGE_Burkho_Bcep176_NC_007497(1),PHAGE_Entero_mEpX2_NC_019705(1),PHAGE_Mycoba_MOOREtheMARYer_NC_028791(1)   3                                17.64%                           58.49%                        
                                  4              37.1Kb                   questionable(90)              tail,virion,capsid,portal,terminase          5756724-5793879          0                        30                      22                               4                                86.6%                            4                                no                               15                               PHAGE_Pseudo_JBD93_NC_030918(5),PHAGE_Pseudo_M6_NC_007809(5),PHAGE_Pseudo_YuA_NC_010116(4),PHAGE_Pseudo_PAE1_NC_028980(4),PHAGE_Pseudo_JBD24_NC_020203(4),PHAGE_Pseudo_vB_PaeS_PAO1_Ab30_NC_026601(3),PHAGE_Vibrio_vB_VpaM_MAR_NC_019722(3),PHAGE_Synech_S_CBS1_NC_016164(3),PHAGE_Vibrio_VHML_NC_004456(3),PHAGE_Vibrio_VP58.5_NC_027981(3),PHAGE_Pseudo_MP1412_NC_018282(3),PHAGE_Pseudo_DMS3_NC_008717(2),PHAGE_Stenot_vB_SmaS_DLP_2_NC_029019(2),PHAGE_Synech_S_CBS3_NC_015465(2),PHAGE_Pseudo_PaMx11_NC_028770(2),PHAGE_Rhizob_RR1_A_NC_021560(2),PHAGE_Pseudo_MP38_NC_011611(2),PHAGE_Pseudo_vB_PaeS_PAO1_Ab18_NC_026594(2),PHAGE_Pseudo_PaMx28_NC_028931(2),PHAGE_Vibrio_SIO_2_NC_016567(2),PHAGE_Rueger_DSS3_P1_NC_025428(1),PHAGE_Klebsi_phiKO2_NC_005857(1),PHAGE_Cellul_phi18:3_NC_021794(1),PHAGE_Shewan_1/44_NC_025463(1),PHAGE_Achrom_phiAxp_2_NC_029106(1),PHAGE_Pseudo_vB_Pae_Kakheti25_NC_017864(1),PHAGE_Vibrio_12A10_NC_029067(1),PHAGE_Pseudo_vB_PaeS_PM105_NC_028667(1),PHAGE_Pseudo_vB_PaeS_SCH_Ab26_NC_024381(1),PHAGE_Cellul_phi46:3_NC_021792(1),PHAGE_Vibrio_12B3_NC_021067(1),PHAGE_Ralsto_RS138_NC_029107(1),PHAGE_Salmon_SSU5_NC_018843(1),PHAGE_Vibrio_12B12_NC_021070(1),PHAGE_Cellul_phi39:1_NC_021804(1),PHAGE_Pseudo_phiMK_NC_031110(1),PHAGE_Pseudo_73_NC_007806(1),PHAGE_Pseudo_PaMx74_NC_028809(1),PHAGE_Pseudo_MP22_NC_009818(1),PHAGE_Rhizob_vB_RleS_L338C_NC_023502(1),PHAGE_Pseudo_PaMx42_NC_028879(1),PHAGE_Burkho_phi6442_NC_009235(1),PHAGE_Stenot_S1_NC_011589(1),PHAGE_Pseudo_B3_NC_006548(1),PHAGE_Pseudo_D3112_NC_005178(1),PHAGE_Bacter_Lily_NC_028841(1),PHAGE_Burkho_phiE125_NC_003309(1),PHAGE_Vibrio_X29_NC_024369(1),PHAGE_Burkho_AH2_NC_018283(1)   3                                16.66%                           56.85%                        
                                  5              35Kb                     incomplete(50)                capsid,integrase                             5794433-5829445          0                        18                      10                               3                                72.2%                            5                                yes                              9                                PHAGE_Entero_JenP1_NC_029028(2),PHAGE_Entero_CAjan_NC_028776(2),PHAGE_Entero_JenP2_NC_028997(2),PHAGE_Psychr_pOW20_A_NC_020841(1),PHAGE_Idioma_1N2_2_NC_025439(1),PHAGE_Burkho_BcepGomr_NC_009447(1),PHAGE_Strept_MM1_NC_003050(1),PHAGE_Strept_EJ_1_NC_005294(1),PHAGE_Mycoba_Milly_NC_026598(1),PHAGE_Entero_JenK1_NC_029021(1),PHAGE_Mycoba_Cheetobro_NC_028979(1),PHAGE_Strept_phiARI0746_NC_031907(1),PHAGE_Salico_CGphi29_NC_020844(1),PHAGE_Gordon_Wizard_NC_030913(1),PHAGE_Entero_phiFL3A_NC_013648(1),PHAGE_Mycoba_Phelemich_NC_022063(1),PHAGE_Deep_s_D6E_NC_019544(1),PHAGE_Verruc_P8625_NC_029047(1),PHAGE_Pseudo_PPpW_3_NC_023006(1),PHAGE_Bacill_TP21_L_NC_011645(1),PHAGE_Aurant_AmM_1_NC_027334(1),PHAGE_Bacill_BM5_NC_029069(1),PHAGE_Burkho_phiE12_2_NC_009236(1),PHAGE_Bacill_phi105_NC_004167(1),PHAGE_Bacill_BMBtp2_NC_019912(1),PHAGE_Escher_slur01_NC_028831(1),PHAGE_Mycoba_ZoeJ_NC_024147(1),PHAGE_Mycoba_Acadian_NC_023701(1),PHAGE_Thermo_THSA_485A_NC_018264(1),PHAGE_Entero_phiFL1A_NC_013646(1),PHAGE_Lactob_Lj771_NC_010179(1),PHAGE_Mycoba_Baee_NC_028742(1)   2                                11.11%                           49.25%                        
                                  6              33.7Kb                   questionable(80)              recombinase,capsid,terminase,tail,head       6867447-6901202          0                        37                      26                               7                                89.1%                            4                                yes                              7                                PHAGE_Pseudo_phi3_NC_030940(19),PHAGE_Aeromo_phiO18P_NC_009542(17),PHAGE_Haemop_HP1_NC_001697(10),PHAGE_Pasteu_F108_NC_008193(9),PHAGE_Vibrio_8_NC_022747(9),PHAGE_Vibrio_K139_NC_003313(9),PHAGE_Haemop_HP2_NC_003315(8),PHAGE_Ralsto_RSY1_NC_025115(3),PHAGE_Burkho_KS14_NC_015273(2),PHAGE_Burkho_KS5_NC_015265(2),PHAGE_Salmon_Fels_2_NC_010463(2),PHAGE_Ralsto_RSA1_NC_009382(1),PHAGE_Phormi_MIS_PhV1A_NC_029032(1),PHAGE_Entero_N15_NC_001901(1),PHAGE_Salmon_RE_2010_NC_019488(1),PHAGE_Vibrio_vB_VpaM_MAR_NC_019722(1),PHAGE_Halomo_phiHAP_1_NC_010342(1),PHAGE_Klebsi_phiKO2_NC_005857(1),PHAGE_Vibrio_VP882_NC_009016(1),PHAGE_Bdello_phi1422_NC_019525(1),PHAGE_Entero_186_NC_001317(1),PHAGE_Pseudo_phiCTX_NC_003278(1),PHAGE_Entero_fiAA91_ss_NC_022750(1),PHAGE_Haemop_SuMu_NC_019455(1),PHAGE_Burkho_KL3_NC_015266(1)   18                               51.35%                           55.42%                        

第二个变量的源文件(这是一个很大的文件):

     source          1..7215267
                     /organism="Hahella chejuensis KCTC 2396"
                     /mol_type="genomic DNA"
                     /strain="KCTC 2396"
                     /db_xref="taxon:349521"
     gene            247..381
                     /locus_tag="HCH_00001"
     CDS             247..381
                     /locus_tag="HCH_00001"
                     /codon_start=1
                     /transl_table=11
                     /product="hypothetical protein"
                     /protein_id="ABC26924.1"
                     /translation="MGFGHRVLFSLKNINIRFSLYIESRRLKFAQKKSKHVRILEVWK
                     "
     gene            378..1781
                     /gene="dnaA"
                     /locus_tag="HCH_00002"
     CDS             378..1781
                     /gene="dnaA"
                     /locus_tag="HCH_00002"
                     /note="TIGRFAMsMatches:TIGR00362"
                     /codon_start=1
                     /transl_table=11
                     /product="chromosomal replication initiator protein DnaA"
                     /protein_id="ABC26925.1"
                     /translation="MTSELWHQCLGYLEDELPAQQFNTWLRPLQAKGSEEELLLFAPN
                     RFVLDWVNEKYIGRINEILSELTSQKAPRISLKIGSITGNSKGQQASKDSAVGATRTT
                     APSRPVIADVAPSGERNVTVEGAIKHESYLNPTFTFETFVEGKSNQLARAAAMQVADN
                     PGSAYNPLFLYGGVGLGKTHLMQAVGNAIFKKNPNAKILYLHSERFVADMVKALQLNA
                     FNEFKRLYRSVDALLIDDIQFFARKERSQEEFFHTFNALLEGGQQMILTCDRYPKEID
                     HMEERLKSRFGWGLTVMVEPPELETRVAILMKKAEQANVHLSSESAFFIAQKIRSNVR
                     ELEGALKLVIANAHFTGQEITPAFIRECLKDLLALHEKQVSIDNIQRTVAEYYKIRIA
                     DILSKRRTRSITRPRQMAMALAKELTNHSLPEIGEAFGGRDHTTVLHACKVMIELQQS
                     DPTLRDDYQNFMRMLTS"
     gene            1884..2987
                     /gene="dnaN"
                     /locus_tag="HCH_00003"
     CDS             1884..2987
                     /gene="dnaN"
                     /locus_tag="HCH_00003"
                     /EC_number="2.7.7.7"
                     /note="TIGRFAMsMatches:TIGR00663"
                     /codon_start=1
                     /transl_table=11
                     /product="DNA polymerase III, beta subunit"
                     /protein_id="ABC26926.1"
                     /translation="MKLTITREALVTSLQMISGVVEKRQTMPVLANVLLDARDGKLVI
                     TGTNMEVELVAEISDVNIEHESRITVPAKKFTDICRALPEGAAIGIELKDGRLNVRYG
                     SSHFILSTLPAEHFPNVEEEPESVKVTLPQRELKRLIDATAFAMAQQDVRYYLNGMLM
                     ELDEQGLRTVATDGHRLALANVSLQTGVSEKRQPIVPRKGILELGRLLNDTDESCTLV
                     FGDNHVRASVGHFTFTSKLIDGKFPDYQRVIPRSGDKVMLADRVLLKGVLSRASILSH
                     ESIRGVRLQFEEGLLKVFANNPDQEEAEDSLEVEYPHEALQIGFNVGYLIDVLNALDD
                     EQVKVTLSNANSSALVEGVDTRDAVYVVMPMRL"
     gene            3008..3103
                     /locus_tag="HCH_00004"
     CDS             3008..3103
                     /locus_tag="HCH_00004"
                     /codon_start=1
                     /transl_table=11
                     /product="hypothetical protein"
                     /protein_id="ABC26927.1"
                     /translation="MNLFELERSRRVARSGMTLGKDVSPLNADRV"
     gene            3128..4405
                     /gene="aarF"
                     /locus_tag="HCH_00005"
     CDS             3128..4405
                     /gene="aarF"
                     /locus_tag="HCH_00005"
                     /note="Predicted unusual protein kinase; COG0661"
                     /codon_start=1
                     /transl_table=11
                     /product="ABC1 family protein kinase"
                     /protein_id="ABC26928.1"
                     /translation="MGKIVNAVKGAARIGQTAAVISKVGLGWLKGNRAPAPRLLRQTF
                     EELGATYIKLGQFIASSPTFFPADYVEEFQLCLDKTKPLPYSQIEKILKEEFKRPLQS
                     IYSHIDTKPLASASIAQVHAARLVTGEDVVIKVQKPGVRNVLLTDLNFLYVAARVVEY
                     LAPKLSWTSLSGIVEEIQRTMMEECDFYQEAANLKEFREFLVSSGNDQAVVPTVYEQA
                     STMRVLTMERFYGVPLTDLETIRKYCSDPEKTLITAMNTWFASLTQCDFFHADVHAGN
                     LMVLEDGRIGFIDFGIVGRIGAGTWQAVSDFITAIMMGNFHGMADAMSRIGITKSQLS
                     VDDLAADIADVYKKMDAMTPDMPPIYYDQQTGDDEVNNILMDLVRIGEQHGLHFPREF
                     ALLLKQFLYFDRYVHVLAPELDMFMDERLSLIQ"

3 个答案:

答案 0 :(得分:0)

您需要分隔并命名要对数据执行的所有操作。然后找到等效的UNIX命令。大多数UNIX工具都是按行而不是按列工作,因此学习按行思考是有益的;)不要将bash当作普通的编程语言,它是一种胶粘剂。将所有工具放在一起-就是不要将潜在的大数据分配给变量。

您要从两个文件(awk中提取一列,然后将两个提取的列粘贴到输出(paste)中,并以\t分隔(粘贴使用{{1 }}(默认)),其后是标题行。您可以创建两个中间文件,也可以使用外壳替换。

\t

编辑:看到源数据,这可能会产生所需的文件格式,但数据不正确。您需要确保来自第一个paste\ <( <CP000155.phaster awk '$2~/[0-9]Kb/{print ($5)}' )\ <( <CP000155.gbk awk '$1~/CDS/{print ($2)}' ) | (echo -e 'Phaster_positions\tGBKPositions'; cat) \ > gbk3.txt 的第三行与来自第二个awk的第三行完全对应。您的数据可能需要使用唯一标识符由awk进行组合。.

答案 1 :(得分:0)

我并没有非常彻底地浏览这些文件,基本上是从您的代码中分离出一部分并将其合并到一个脚本中,并添加了哈希和输出:

$ awk -v OFS="\t" '        # tab as output delimiter
NR==FNR && $2~/[0-9]Kb/ {  # process the first file (with a condition)
    a[++i]=$5              # hash $5 to a
    next                   # process next record
}
$1~/CDS/ {                 # process the second file (with a condition)
    b[++j]=$2              # hash $2 to b
}
END {
    print "Phaster_positions","GBKPositions"
    if(i>=j)               # was there more is or js
        n=i                # take the bigger value and use it...
    else 
        n=j
    for(i=1;i<=n;i++)      # ... here
        print a[i],b[i]    # output side by side
}' first second

输出:

Phaster_positions       GBKPositions
371860-418565   247..381
2947108-2988239 378..1781
4663633-4680174 1884..2987
5756724-5793879 3008..3103
5794433-5829445 3128..4405
6867447-6901202

这有意义吗?它存储与a哈希匹配的文件1和与b匹配的文件2。如果有大量数据,则可能内存不足。如果是这种情况,请返回报告,我们将为您提供其他解决方案。

更新

此文件仅存储文件1到a,并在输出文件2时清空文件:

awk '
BEGIN {
    OFS="\t"               # the output field separator
    print "Phaster_positions","GBKPositions"  # output the header
}
NR==FNR && $2~/[0-9]Kb/ {  # process the first file (with a condition)
    a[++i]=$5              # hash $5 to a
    next                   # process next record
}
$1~/CDS/ {                 # process the second file (with a condition)
    print ((++j in a)?a[j]:"") OFS $2  # output from a if exists and $2
    delete a[j]            # delete after output
}
END {
    for(j=1;j<=i;j++)      # stupid loop
        if(j in a)         # if there are any left in a
        print a[j] OFS     # output them
}' first second

未经战斗测试,for中的END循环很愚蠢。

答案 2 :(得分:-1)

printf "Phaster_positions\tGBKPositions\n\n">gbk3.txt

PhasterPositions=`awk '$2~/[0-9]Kb/{print ($5)}' CP000155.phaster`
GBKPositions=`awk '$1~/CDS/{print ($2)}' CP000155.gbk`

printf "$PhasterPositions\t$GBKPositions">>gbk3.txt

看看是否可行