在某些单词后输入字母数字字符串到文本文件中(密码/ sed / awk)

时间:2015-01-04 11:17:06

标签: bash shell awk sed passwords

我手头有一个包含690个条目的文本文件,类似于P.S中显示的条目。 (在P.S.中显示的是一个例子,从这里http://www.ncbi.nlm.nih.gov/nuccore/AB753792.1)。在我的文本文件中,条目由" //"。

分隔

在我的案例中," ACCESSION" (字符串和3个空格)没有大写字母数字字符串 (例如" AB753792"在P.S.)。我使用默认的Bash运行MacOSX Yosemite,并希望用独特的大写字母数字字符串填充690个空格,例如生成的:

openssl rand -hex 4 | tr '[:lower:]' '[:upper:]'    

(5.1.15:我已经改变了上面的命令,在这篇文章的第一个版本中有所不同)

我可以看到sed / awk如何成为这个问题的解决方案,但我无法弄清楚sed如何能够在每个" ACCESSION&之后插入一个唯一的8位大写字母数字字符串。 #34;

我很乐意接受帮助。

亲切的问候,

P.S。

LOCUS       AB753792                 712 bp    DNA     linear   INV 26-JUN-2013
DEFINITION  Acutuncus antarcticus mitochondrial gene for cytochrome c oxidase
            subunit 1, partial cds.
ACCESSION   AB753792
VERSION     AB753792.1  GI:478246768
KEYWORDS    .
SOURCE      mitochondrion Acutuncus antarcticus
ORGANISM  Acutuncus antarcticus
        Eukaryota; Metazoa; Ecdysozoa; Tardigrada; Eutardigrada; Parachela;
        Hypsibiidae; Acutuncus.
REFERENCE   1
AUTHORS   Kagoshima,H., Imura,S. and Suzuki,A.C.
TITLE     Molecular and morphological analysis of an Antarctic tardigrade,
          Acutuncus antarcticus
JOURNAL   J. Limnol. 72 (s1), 15-23 (2013)
REFERENCE  2  (bases 1 to 712)
AUTHORS   Kagoshima,H. and Suzuki,A.C.
TITLE     Direct Submission
JOURNAL   Submitted (07-OCT-2012) Contact:Hiroshi Kagoshima Transdisciplinary
        Research Integration Center/Nationlal Institute of Genetics; 1111
        Yata, Mishima, Shizuoka 411-8540, Japan
FEATURES             Location/Qualifiers
     source          1..712
                     /organism="Acutuncus antarcticus"
                     /organelle="mitochondrion"
                 /mol_type="genomic DNA"
                 /isolation_source="moss sample (Bryum pseudotriquetrum,
                 Bryum argenteum, and Ceratodon purpureus)"
                 /db_xref="taxon:467037"
                 /country="Antarctica: East antarctica, soya coast,
                 Skarvsnes and Langhovde"
 CDS             <1..712
                 /codon_start=2
                 /transl_table=5
                 /product="cytochrome c oxidase subunit 1"
                 /protein_id="BAN14781.1"
                 /db_xref="GI:478246769"
                 /translation="GQQNHKDIGTLYFIFGVWAATVGTSLSMIIRSELSQPGSLFSDE
                 QLYNVTVTSHAFVMIFFFVMPILIGGFGNWLVPLMISAPDMAFPRMNNLSFWLLPPSF
                 MLITMSSMAEQGAGTGWTVYPPLAHYFAHSGPAVDLTIFSLHVAGASSILGAVNFIST
                 IMNMRAPSISLEQMPLFVWSVLLTAILLLLALPVLAGAITMLLLDRNFNTSFFDPAGG
                 GDPILYQHLFWFFGHPEV"
 ORIGIN      
         1 tggtcaacaa aatcataaag atattggtac actttatttt atttttggag tatgagctgc
       61 tacagtagga acatctctta gtatgattat ccggtcagaa cttagacaac caggatcact
       121 cttctcagat gaacaacttt acaacgttac agtaacaaga catgcatttg tcataatttt
       181 cttttttgta atacccatcc ttattggagg atttggaaat tgactagtac ctttaatgat
       241 ttcagcacca gatatagctt tcccccgaat aaataacctg agattctgac tactaccccc
       301 atcttttata ttaattacta taagaagtat agcagaacaa ggagccggga cagggtgaac
       361 agtttacccc cctttagctc actattttgc acactcagga ccagctgtcg atttaactat
       421 tttttctctg catgtagcag gagcatcgtc gattttagga gccgtaaact tcatttctac
       481 aattatgaat atgcgagctc catcaattag tttagaacaa atgccactat ttgtatgatc
       541 agtactactt acagccattt tacttctact agctctgcca gtattagcag gagccatcac
       601 aatgctttta ttagaccgaa attttaacac atcgtttttt gatcctgctg gtgggggaga
       661 tccaattctc tatcaacatt tattttgatt ttttggtcac cctgaagttt aa
 //    

3 个答案:

答案 0 :(得分:2)

您可以使用gawk

gawk '/ACCESSION[ \t]*$/{l=$0;cmd="openssl rand -base64 32 | tr '[a-z]' '[A-Z]'";cmd |& getline a;close(cmd);print l,a;next}{print}' /path/to/input > /path/to/output

它可以作为多行脚本更好地阅读:

#!/usr/bin/gawk -f

# If a line with an empty ACCESSION field appears
# The following block gets executed
/ACCESSION[ \t]*$/ {
    # Backup current line
    line=$0
    # Prepare the openssl command
    cmd="openssl rand -base64 32 | tr '[a-z]' '[A-Z]'"
    # Execute the openssl command and store results into random
    cmd |& getline random;
    close(cmd);
    # Print the line
    printf "%s   %s\n", line, random;
    # Step forward to next line of input. (Don't execute
    # the following block)
    next
}

# Print all other lines - unmodified
{print}

请注意,您需要使用GNU awk(gawk),因为该脚本使用的协同进程仅适用于GNU版本的awk

答案 1 :(得分:1)

你可以按照下面的文件

进行尝试
#!/bin/bash
for i in {1..7}; do 
    var=$(openssl rand -hex 4 | tr '[:lower:]' '[:upper:]');
    sed  -i.bak '/^ACCESSION   $/{s#ACCESSION   #&'"${var}"'#g;:tag;n;b tag}' "$1"
done

注意我使用{1..7}循环七次,如果我有一个包含7行ACCESSION的文件,后跟正好三个空格和行尾

例如

ACCESSION   
VERSION
ACCESSION   
VERSION
ACCESSION   
VERSION    
ACCESSION   
VERSION    
ACCESSION   
VERSION    
ACCESSION   
VERSION    
ACCESSION   

输出

ACCESSION   E4197EB1
VERSION
ACCESSION   EFA0CEFF
VERSION
ACCESSION   9499CA54
VERSION    
ACCESSION   2AD2690D
VERSION    
ACCESSION   3598659F
VERSION    
ACCESSION   25608153
VERSION    
ACCESSION   1B43896B

修改 由于您使用的是mac OS X,您可以尝试替代

#!/bin/bash
for i in {1..7}; do 
    var=$(openssl rand -hex 4 | tr '[:lower:]' '[:upper:]');
    sed  -i.bak '
    /^ACCESSION   $/{
    s#ACCESSION   #&'"${var}"'#g
    :tag
    n
    b tag
    }' "$1"
done

答案 2 :(得分:0)

非常感谢你帮助我使用@ hek2mgl解决方案因为我无法获得sed命令。

感谢您在示例代码中提供评论。我修改如下:

#!/usr/local/bin/gawk -f
# If a line with an empty ACCESSION field appears
# The following block gets executed
/ACCESSION/ {
# Backup current line
line=$0
# Prepare the openssl command
cmd="openssl rand -hex 4 | tr '[:lower:]' '[:upper:]'"
# Execute the openssl command and store results into random
cmd |& getline random;
close(cmd);
# Print the line
printf "ACCESSION   %s\n",random;
# Step forward to next line of input. (Don't execute
# the following block)
next
}

# Print all other lines - unmodified
{print}