Question

我正在尝试解析一个html片段来检索我需要的一些数据。我试图在这里找到SO但是找不到解决方案你是否通过某种东西过滤并得到其他东西。

我不需要解决方案来获取所有html标签或清除整个页面。我只是想改进我已经发挥作用的Regex。

该页面长200行，我真的想在隐藏字段上进行检索（行只是为了提高可读性而突破）：

<!-- ...long list of html tags and hidden fields... -->

<input type="hidden" 
   name="javax.faces.ViewState" 
   id="javax.faces.ViewState" 
   value="valueIwant" 
   autocomplete="off" />

<!-- ...more html... -->

我需要通过value或javax.faces.ViewState从名为name的元素中检索id属性。

我最终得到了这段代码：

string value = Regex.Match(html, "<input[^>]*name=\"(javax.faces.ViewState)\"[^>]*>");

这成功地找到了确切的标签，但完全重新获得了它。我真正想要的是改进这个Regex只返回value属性的内容。

我不想通过拨打Substring来解决这个问题，因为我不太清楚这个内容的大小。

Answer 1

Match match = Regex.Match(html, "<input[^>]*name=\"javax.faces.ViewState\"[^>]*value=\"([^\"]*)\"");
if (match.Success)
{
    Console.WriteLine(match.Groups[1].Value);
}

1）如果name和value的顺序发生变化，则无效 2）如果在html源代码中用双引号替换双引号，这将不起作用。

如何使用正则表达式从隐藏字段中获取值属性？

1 个答案: