Question

我有一个包含多个嵌套表的html文件：

    <table>
      <tr>
        <td>
            <table>
               <tr>
                 <td>
                      <table>
                       ...
                       </table>
                 </td>
               </tr>
           </table>
        </td>
      </tr>
    </table>

我想为每个表添加类：

      <table class="table1">
      <tr>
        <td>
            <table class="table2">
               <tr>
                 <td>
                      <table class="table3">
                       ...
                       </table>
                 </td>
               </tr>
           </table>

基于广泛的搜索，我拼凑了以下bash脚本，但它根本不起作用：

#!/bin/bash
strng="<table"
index=1
for entry in `grep -n $strng $1`
do
line=`echo $entry | awk -F":" '{print$1}'`
sed -e "$line s/$strng/$strng class=\"table$index\"/" -i $1
index=$(($index + 1))
done

任何建议都将受到赞赏。

Answer 1

如果您需要通用解决方案，则应使用特定于html的工具。如果您知道您的html仅限于您显示的格式，请尝试：

awk '/<\/table>/{i--} /<table>/{sub(/<table>/, "<table class=\"table"++i"\">")} 1' file.html

实施例

$ awk '/<\/table>/{i--} /<table>/{sub(/<table>/, "<table class=\"table"++i"\">")} 1' file.html
    <table class="table1">
      <tr>
        <td>
            <table class="table2">
               <tr>
                 <td>
                      <table class="table3">
                       ...
                       </table>
                 </td>
               </tr>
           </table>
        </td>
      </tr>
    </table>

如何运作

/<\/table>/{i--}

对于包含</table>的任何行，我们减少变量i。
/<table>/{sub(/<table>/, "<table class=\"table"++i"\">")}

对于包含<table>的任何行，我们增加变量i并在<table>中用类值替换。
1

这是awk用于打印线的神秘简写。

就地更改文件

如果你想要就地更改文件并且你有GNU awk（gawk），那么使用：

awk -i inplace '/<\/table>/{i--} /<table>/{sub(/<table>/, "<table class=\"table"++i"\">")} 1' file.html

对于其他awk：

awk '/<\/table>/{i--} /<table>/{sub(/<table>/, "<table class=\"table"++i"\">")} 1' file.html >tmp && mv tmp file.html

作为bash脚本

#!/bin/bash
# Usage: script.sh infile outfile
awk '/<\/table>/{i--} /<table>/{sub(/<table>/, "<table class=\"table"++i"\">")} 1' "$1" >"$2"

请注意，文件名$1和$2位于双引号内。如果名称包含空格或其他shell活动字符，这可以防止出现意外。

作为一种风格而非实质的问题，有些人更喜欢在多行上传播awk代码。当想要添加新功能时，这可以更容易理解代码或修改代码。因此，如果喜欢，上面的脚本也可以写成：

#!/bin/bash
# Usage: script.sh infile outfile
awk '
    /<\/table>/{ i-- }

    /<table>/{ sub(/<table>/, "<table class=\"table"++i"\">") }

    1
    ' "$1" >"$2"

Answer 2

只是为了快速解决问题：

#!/bin/bash
temp_file="$( mktemp )"
sed 's/\(<table\)/\1 class="$_field_$"/g' "$1" > "$temp_file"
index=0
while grep -e '[$]_field_[$]' "$temp_file" >/dev/null
 do
    sed -i "s/[$]_field_[$]/$index/" "$temp_file"
    ((++index))
 done
cp "$temp_file" "$1"
rm -f "$temp_file"

但必须提到的是，不应使用sed或awk等工具来处理XML属性。使用this answer中建议的专用工具。

Answer 3

要解决您的脚本无法正常工作的原因：for X in Y在空格上拆分，而不仅仅是像你一样期待的换行符。

将$IFS设置为换行符，它应该按预期工作。

#!/bin/bash
IFS='
'
strng="<table"
index=1
for entry in `grep -n $strng $1`
do
  line=`echo $entry | awk -F":" '{print$1}'`
  sed -e "$line s/$strng/$strng class=\"table$index\"/" -i $1
  index=$(($index + 1))
done

否则你的grep命令正在返回：

1:    <table> 
4:            <table>
7:                      <table>

并在6次迭代中处理：

1:
<table>
4:
<table>
7:
<table>

您可以通过在脚本顶部添加set -x来查看命令的跟踪。

Bash脚本将类附加到html文件

3 个答案:

实施例

如何运作

就地更改文件

作为bash脚本