Question

我如何编写bash脚本来执行以下操作

以html和htm结尾的目录中递归搜索所有文件。
使用sed搜索<body>并删除此行之前的所有行，包括<body>行
以及搜索</body>并删除之后的所有行，包括<body>行。
更改不应位于同一文件中，而应与index-temp.html相同。

我写了下面但是我不知道如何更改整个块之后并安全地更改到另一个文件而不是在同一个文件上。我必须使用if吗？

#!/bin/bash
input=$1
find "$input" -type f -name "*.htm" -exec sed

Answer 1

@Tom Fenech说：

xmllint --html --xpath '//body/node()' index.htm* > index-temp.html

<body> and <BODY>已定位
*.html?(l)仅用于htm / html但有extglob有效（默认为debian）

详细介绍@tripleee：

find "$input" -type f -iregex '.*\.html?' \
  -exec sh -c 'for f; do
      xmllint --html --xpath "//body/node() "$f" >"${f%.htm*}"-temp.html;
    done' _ {} +

Answer 2

对于单个文件，sed命令为：

sed '1,/<body>/d;/<\/body>/,/$/d' index.html > index-temp.html

语法是

sed 'ROWa,ROWz d'

其中ROWa是要开始的亚麻，ROWz在哪里结束，包括从1开始计算。$可以用于LASTLINE。

您也可以使用模式：

sed '/PATa/,/PATz/ d'

从模式PATa到模式PATz。并且图案/线条可以混合使用。

现在查找部分：

find "$input" -type f -name "*.htm*" -exec sed -i.temp '0,/<body>/d;/<\/body>/,/$/d' {} ";"

会更改htm（l）-file，但会从原始文件创建备份（如index.html.temp）。

也许这对你来说更方便。否则你必须重命名所有那些需要另一个脚本的文件，因为sed并且发现它们不知道重定向，所以它需要一些带有basename的shell调用，这是另一种方法：

#/bin/bash
#
# justbody.sh
#
infile=$1
outfile="$(basename $infile .htm)-temp.htm"
sed '0,/<body>/d;/<\/body>/,/$/d' $infile > $outfile

现在通过以下方式调用：

find "$input" -type f -name "*.htm" -exec ./justbody.sh {} ";"

使用sed替换模式

2 个答案: