Question

我有以下html：

<html>
  <head></head>
  <body>
     <span class="hello-style" id="hello123">
        hello world
     </span>
     <span class="value-style">
        1000
     </span>
     <span class="value-style">
        2000
     </span>
     <span class="value-style">
        3000
     </span>
  </body>
</html>

我想匹配<span class="value-style">之后可能是任何内容的每个值，因此上面示例的输出应为：
1000
2000
3000

这至少应该删除所有非数字值，但它不会： curl 127.0.0.1/index.html | sed 's/[a-zA-Z]/""/'

修改

curl 127.0.0.1/index.html | tr -d '\n' | sed '...'

Answer 1

awk救援！

$ awk '/<\/span/{f=0} f; /<span class="value-style"/{f=1}' file

    1000
    2000
    3000

提取模式之间的界限。

Answer 2

您不应该使用awk / sed工具解析html / xml内容。
正确的方法是使用xml / html解析器，如xmlstarlet：

xmlstarlet sel -t -v '//span[@class="value-style"]' -n index.html | grep -o '[^[:space:]]*'

输出：

1000
2000
3000

//span[@class="value-style"] - xpath表达式，仅选择span个标签（具有指定的属性class）值
grep -o '[^[:space:]]*' - 从输出中提取非空格值

正则表达式：匹配html

2 个答案: