Question

我正在尝试从网页抓取数据。我已将网页下载到字符串变量中。

我想知道如何抓住两个标签之间的值。我已经包含了下载字符串的片段，我想要的值是895

<div class="split2r right">


                    <strong>Avg. asking rent in M4:</strong> 
                    <strong class="price big">&pound;897 pcm</strong><br>
                    <strong>No. of properties to rent in M4:</strong> <strong><a data-ga-category="Area stats" data-ga-action="properties_to_rent" data-ga-label="/tracking/home-values/results/" href="/to-rent/property/manchester/isaac-way/m4-7ed/">225</a></strong>

            </div>

代码示例会很棒。

Answer 1

使用HtmlAgilityPack library解析HTML实际上很容易。

第一步是添加对HtmlAgilityPack库的引用。然后你就可以开始解析HTML了：

const string Html = "<strong>Avg. price:</strong> <strong class=\"price big\">&pound;895 pcm</strong><br><strong>this is the price of zed headphones</strong>";

var doc = new HtmlDocument();
doc.LoadHtml(Html);

下一步是找到您要查找的元素，在这种情况下，<strong>元素的class设置为price big：

var priceNode = doc.DocumentNode.SelectSingleNode("//strong[@class='price big']");

现在我们的最后一步是从节点的InnerText属性中检索实际数字。可能最好的方法是通过正则表达式，如果我们假设所需的数字是节点内部文本中唯一的数字，这可能非常简单：

var priceMatch = Regex.Match(priceNode.InnerText, @"(\d+)");

Console.WriteLine(priceMatch); // Will output 895

Answer 2

private void button1_Click(object sender, EventArgs e)
{
    string input = @"<strong class=""price big"">&pound;895 pcm</strong><br>";

    MatchCollection mc = Regex.Matches(input, ">&pound;\d{0-5} pcm");

    foreach (Match m in mc)
    {
        Add To List  Convert.ToInt32(m);
    } 
}

Answer 3

假设您的字符串值被称为“source”，并且所有提取都被格式化为示例

var value = Regex.Replace(source, @"\D", string.Empty);

如何从字符串中获取两个标记之间的值

3 个答案: