Question

我想从网站中提取标签值。我查看了Chrome中的html源代码，找到了一行：

<strong><span id="lbName">George</span></strong>

标签名称lbName在该请求中是唯一的。但是如何从这一行中提取名称“George”？我查看了正则表达式，但到目前为止只有字符串包含某些模式，我已经知道它了。

    public static void GetName()
    {
        HttpWebRequest myRequest = (HttpWebRequest)WebRequest.Create("Http://MyWebsite.com");
        myRequest.Method = "GET";
        WebResponse myResponse = myRequest.GetResponse();
        StreamReader sr = new StreamReader(myResponse.GetResponseStream(), System.Text.Encoding.UTF8);
        string result = sr.ReadToEnd();
        sr.Close();
        myResponse.Close();

        string sPattern = "lbName";
        // extract the value of lbName ?

    }

Answer 1

有一个图书馆Html Agility Pack。用那个。我要补充一点，如果您一直在查看同一页面，并且您知道该页面不会更改其格式，则只需使用IndexOf方法并搜索<span id="lbName">即可。类似的东西：

const string searchFor = "<span id=\"lbName\">"; // open marker
const string endSearchFor = "</span>"; // close marker

string result = "letters" + searchFor + "text" + endSearchFor; // Sample text, here put your text

int ix1 = result.IndexOf(searchFor);
if (ix1 == -1)
{
    throw new Exception();
}

ix1 += searchFor.Length;

int ix2 = result.IndexOf(endSearchFor, ix1);
if (ix2 == -1)
{
    throw new Exception();
}

string text = result.Substring(ix1, ix2 - ix1);

Answer 2

以下正则表达式应该有效：

[^<strong><span id="lbName">].*(?=</span><s/trong>)

如何检索网站中标签的值？

2 个答案: