打印在某些列中找到的相同单词,而在UNIX

时间:2017-09-05 16:32:08

标签: bash unix awk

这是一个缩短的表格input.tsv

rs928302        YES     TMPRSS3 rf      G       V       53      NM_001256317.1  NP_001243246.1
                                rf      G       V       53      NM_024022.2     NP_076927.1
                                rf      G       V       53      NM_032405.1     NP_115781.1
rs1046210       YES     BACE2   rf      C       D       364     NM_012105.4     NP_036237.2
                                rf      C       D       364     NM_138992.2     NP_620477.1
                                rf      C       D       269     XM_017028314.1  XP_016883803.1
rs1064579       YES     IFNGR2  rf      T       V       272     NM_001329128.1  NP_001316057.1
                                rf      T       V       253     NM_005534.3     NP_005525.2
                                rf      T       V       272     XM_005260969.2  XP_005261026.1
                                rf      T       V       278     XM_011529553.1  XP_011527855.1
                                rf      T       V       255     XM_011529554.2  XP_011527856.1

我想在空白字段中打印与顶部相同的单词并应用于第一,第二和第三列,直到文件结束。当出现不同的单词时,下面的打印应该是这个新单词,依此类推。所以输出应该是:

rs928302        YES     TMPRSS3 rf      G       V       53      NM_001256317.1  NP_001243246.1
rs928302        YES     TMPRSS3 rf      G       V       53      NM_024022.2     NP_076927.1
rs928302        YES     TMPRSS3 rf      G       V       53      NM_032405.1     NP_115781.1
rs1046210       YES     BACE2   rf      C       D       364     NM_012105.4     NP_036237.2
rs1046210       YES     BACE2   rf      C       D       364     NM_138992.2     NP_620477.1
rs1046210       YES     BACE2   rf      C       D       269     XM_017028314.1  XP_016883803.1
rs1064579       YES     IFNGR2  rf      T       V       272     NM_001329128.1  NP_001316057.1
rs1064579       YES     IFNGR2  rf      T       V       253     NM_005534.3     NP_005525.2
rs1064579       YES     IFNGR2  rf      T       V       272     XM_005260969.2  XP_005261026.1
rs1064579       YES     IFNGR2  rf      T       V       278     XM_011529553.1  XP_011527855.1
rs1064579       YES     IFNGR2  rf      T       V       255     XM_011529554.2  XP_011527856.1

如何在Unix环境中做到这一点?提前谢谢。

2 个答案:

答案 0 :(得分:1)

/parent-folder
  -> text1.txt (ignored)
  -> text2.txt (ignored)
  ->folder1 (NOT ignored)
    ->ex-gf.png (ignored)
    ->my-resume.docx (NOT ignored)
  ->folder2 (NOT ignored)
    ->summer-vacation2017.jpg (ignored)
    ->my-grocery-list.xlsx (NOT ignored)

答案 1 :(得分:1)

awk 解决方案:

awk 'NF==9{ f1=$1; f2=$2; f3=$3 }
     NF==6{ sub(/^[[:space:]]+/,"",$0); 
     $0=f1 OFS f2 OFS f3 OFS $0 }1' OFS='\t' file

输出:

rs928302    YES TMPRSS3 rf      G       V       53      NM_001256317.1  NP_001243246.1
rs928302    YES TMPRSS3 rf      G       V       53      NM_024022.2     NP_076927.1
rs928302    YES TMPRSS3 rf      G       V       53      NM_032405.1     NP_115781.1
rs1046210   YES BACE2   rf      C       D       364     NM_012105.4     NP_036237.2
rs1046210   YES BACE2   rf      C       D       364     NM_138992.2     NP_620477.1
rs1046210   YES BACE2   rf      C       D       269     XM_017028314.1  XP_016883803.1
rs1064579   YES IFNGR2  rf      T       V       272     NM_001329128.1  NP_001316057.1
rs1064579   YES IFNGR2  rf      T       V       253     NM_005534.3     NP_005525.2
rs1064579   YES IFNGR2  rf      T       V       272     XM_005260969.2  XP_005261026.1
rs1064579   YES IFNGR2  rf      T       V       278     XM_011529553.1  XP_011527855.1
rs1064579   YES IFNGR2  rf      T       V       255     XM_011529554.2  XP_011527856.1