使用流编辑器删除行

时间:2016-03-12 12:29:27

标签: awk sed

我在删除文本中的某些行时遇到问题。

这是文件的输出:

# SHA512 HASH
96896c1b0b52047fac3cdcfea7b15c3eca7fcc53ee3294000493d4421df61e7947cdcaed783edc95e8ba51fbed164f383fc09afdb73587e590e08eef08086a4d  stage3-amd64-nomultilib-20160310.tar.bz2
# WHIRLPOOL HASH
e5e15b81753c6f1dd1886c2567b0012bfd822746d8ddce32ddf6e41f64074b4cb9c49dce787ea4cb160ce1234e0a8ba1d3a66b3904a2fb5500c435dd0fc69fea  stage3-amd64-nomultilib-20160310.tar.bz2
# SHA512 HASH
35735f8c7533bf6cda384a015e3eaac61b89e832f181c49332b04c07cbd3dfe7a61d5c5dce7c1e4155880b2a4e690839efcd914f04523b2a0e1e903749be6192  stage3-amd64-nomultilib-20160310.tar.bz2.CONTENTS# WHIRLPOOL HASH
c04c4d0f677c0e035262632e4fd03d71a786019b94a0ca0565a6c1af51a9103315e3da030d7c0f071ee729543f9b5d591757e43fad6ee66ff5dff88968eb8d2c  stage3-amd64-nomultilib-20160310.tar.bz2.CONTENTS

我的任务是删除行验证.CONTENTS,删除WHIRLPOOL哈希,并验证剩余的SHA-512摘要,因为据我所知我需要将#放在我需要删除的每一行中,我认为可以使用sed完成或者awk。

所需的输出是:

# SHA512 HASH
96896c1b0b52047fac3cdcfea7b15c3eca7fcc53ee3294000493d4421df61e7947cdcaed783edc95e8ba51fbed164f383fc09afdb73587e590e08eef08086a4d  stage3-amd64-nomultilib-20160310.tar.bz2
# WHIRLPOOL HASH
#e5e15b81753c6f1dd1886c2567b0012bfd822746d8ddce32ddf6e41f64074b4cb9c49dce787ea4cb160ce1234e0a8ba1d3a66b3904a2fb5500c435dd0fc69fea  stage3-amd64-nomultilib-20160310.tar.bz2
# SHA512 HASH
#35735f8c7533bf6cda384a015e3eaac61b89e832f181c49332b04c07cbd3dfe7a61d5c5dce7c1e4155880b2a4e690839efcd914f04523b2a0e1e903749be6192  stage3-amd64-nomultilib-20160310.tar.bz2.CONTENTS# WHIRLPOOL HASH
c04c4d0f677c0e035262632e4fd03d71a786019b94a0ca0565a6c1af51a9103315e3da030d7c0f071ee729543f9b5d591757e43fad6ee66ff5dff88968eb8d2c  stage3-amd64-nomultilib-20160310.tar.bz2.CONTENTS

谢谢

1 个答案:

答案 0 :(得分:0)

我不知道如何在sedawk中执行此操作,但我为您提供了一个快速而肮脏的Python脚本。此脚本不会更改您的原始文件。它将创建final.yourfilename,然后您可以重命名。您必须根据用例调整脚本。

tests.py (将其保存在与SHA文件相同的目录中)

file = 'tests.txt' # this is your original file. Replace it with your filename
phrase_of_interest = '.CONTENTS' # we want to look for this phrase
shas_to_comment = [] # list holds information after .CONTENTS like # WHIRLPOOL HASH

# first pass - comment out the line containing .CONTENTS# WHATEVER and remember # WHATEVER in list
with open('tmp.' + file, 'w') as n:
    with open(file, 'r') as f:
        for line in f:          
            if phrase_of_interest + '#' in line:
                n.write('#' + line)
                shas_to_comment.append(line[line.index(phrase_of_interest + '#'):].replace(phrase_of_interest, '').strip())
                #                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                #                           finds position of .CONTENTS#
                #                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                #                          extracts .CONTENTS# WHATEVER
                #                                                                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                #                                                                  removes .CONTENTS, leaving behind #WHATEVER
                # strip(), in this case, is intended to remove \n
                # shas_to_comment will contain # WHATEVER1, # WHATEVER2 etc.
            else:
                n.write(line)

# second pass - now that we know which shas to comment, when we find a comment like # WHIRLPOOL HASH,
# we will set comment_next_line flag to true. The next line will be commented out and comment_next_line flag will be reset
comment_next_line = False
with open('final.' + file, 'w') as n:
    with open('tmp.' + file, 'r') as f:
        for line in f:
            if comment_next_line:
                n.write('#' + line)
            else:
                n.write(line)

            comment_next_line = line.strip() in shas_to_comment

# your final result will be in final.tests.txt file

使用您的文件,结果将是:

python tests.py

# SHA512 HASH
96896c1b0b52047fac3cdcfea7b15c3eca7fcc53ee3294000493d4421df61e7947cdcaed783edc95e8ba51fbed164f383fc09afdb73587e590e08eef08086a4d  stage3-amd64-nomultilib-20160310.tar.bz2
# WHIRLPOOL HASH
#e5e15b81753c6f1dd1886c2567b0012bfd822746d8ddce32ddf6e41f64074b4cb9c49dce787ea4cb160ce1234e0a8ba1d3a66b3904a2fb5500c435dd0fc69fea  stage3-amd64-nomultilib-20160310.tar.bz2
# SHA512 HASH
#35735f8c7533bf6cda384a015e3eaac61b89e832f181c49332b04c07cbd3dfe7a61d5c5dce7c1e4155880b2a4e690839efcd914f04523b2a0e1e903749be6192  stage3-amd64-nomultilib-20160310.tar.bz2.CONTENTS# WHIRLPOOL HASH
c04c4d0f677c0e035262632e4fd03d71a786019b94a0ca0565a6c1af51a9103315e3da030d7c0f071ee729543f9b5d591757e43fad6ee66ff5dff88968eb8d2c  stage3-amd64-nomultilib-20160310.tar.bz2.CONTENTS

使用bash脚本

<强> tests.sh

#!/bin/bash

file='tests.txt' # this is your original file
shas_to_comment=()

while read -r line; do
    if [[ $line == *".CONTENTS#"* ]]; then
        hashline=$(echo $line | awk -F '.CONTENTS' '{print $2}')
        shas_to_comment=("${shas_to_comment[@]}" "$hashline")
    fi
done < $file

comment_next_line=0
while read -r line; do
    for item in "${shas_to_comment[@]}"; do

        if [[ $comment_next_line -eq 1 ]]; then
            echo "#$line"
            comment_next_line=0
            continue
        fi

        if [[ $line == *".CONTENTS${item}"* ]]; then
            echo "#$line"
            continue
        fi

        echo $line

        if [[ $line == $item ]]; then
            comment_next_line=1
        fi

    done
done < $file

使用您的文件,结果将是:

bash tests.sh

# SHA512 HASH
96896c1b0b52047fac3cdcfea7b15c3eca7fcc53ee3294000493d4421df61e7947cdcaed783edc95e8ba51fbed164f383fc09afdb73587e590e08eef08086a4d stage3-amd64-nomultilib-20160310.tar.bz2
# WHIRLPOOL HASH
#e5e15b81753c6f1dd1886c2567b0012bfd822746d8ddce32ddf6e41f64074b4cb9c49dce787ea4cb160ce1234e0a8ba1d3a66b3904a2fb5500c435dd0fc69fea  stage3-amd64-nomultilib-20160310.tar.bz2
# SHA512 HASH
#35735f8c7533bf6cda384a015e3eaac61b89e832f181c49332b04c07cbd3dfe7a61d5c5dce7c1e4155880b2a4e690839efcd914f04523b2a0e1e903749be6192  stage3-amd64-nomultilib-20160310.tar.bz2.CONTENTS# WHIRLPOOL HASH
c04c4d0f677c0e035262632e4fd03d71a786019b94a0ca0565a6c1af51a9103315e3da030d7c0f071ee729543f9b5d591757e43fad6ee66ff5dff88968eb8d2c stage3-amd64-nomultilib-20160310.tar.bz2.CONTENTS