Question

我获得了网页的源代码我希望在vi-buybox-watchcount“＆gt ;.

之后得到一个字

vi-buybox-watchcount“＆gt;后有152个号码。我想提取它..

我知道只有split关键字来执行此操作。但我不能使用'＆gt;'拆分它因为源代码有这么多'＆gt;'用数字..

所以我尝试将其拆分为以下但是它会出错......

for (int i = 0; i < Convert.ToInt32(idlist.Length); i++)
        {
            string url = "http://www.ebay.com/itm/" + idlist[i];
            HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
            HttpWebResponse response = (HttpWebResponse)request.GetResponse();
            StreamReader sr = new StreamReader(response.GetResponseStream());
            // richTextBox2.Text += sr.ReadToEnd();
            string a = sr.ReadToEnd();
            sr.Close();
            string source = null;
            source = string.Join(Environment.NewLine,
           a.Split('vi-buybox-watchcount">') // this is getting errors
                .Where(m => m.All(char.IsDigit)));

请建议我提取此号码的方法

Answer 1

这样的事情：

string strHTML = "..................<span class=\"'vi-buybox-watchcount\">152</span>";

string strFind = "'vi-buybox-watchcount\">";
int startIndex = strHTML.IndexOf(strFind) + strFind.Length;
int endIndex = strHTML.IndexOf("<", startIndex);
string reqValue = strHTML.Substring(startIndex, endIndex-startIndex);

IndexOf将找到要查找的字符串的起始位置，因此添加该字符串的长度以查找值的开头。和＆amp;之间的区别第二个字符串将是要提取的所需长度。

您可能希望在找不到字符串的情况下添加错误检查代码 - 如果找不到，IndexOf将返回-1。

如果有多次出现，那么你可以使用循环＆amp; IndexOf的第二个版本，最后找到endIndex作为第二个参数（初始化为零）。

唯一可能的Linq解决方案可能是：

strHTML.Split(new string[]{strFind}, StringSplitOptions.RemoveEmptyEntries)
    .Where(x => char.IsDigit(x[0]))
    .Select(y => y.Substring(0,y.IndexOf("<")));

或者

strHTML.Split(new string[]{strFind}, StringSplitOptions.RemoveEmptyEntries)
    .Skip(1)
    .Select(y => y.Substring(0,y.IndexOf("<")))
    .Where(m => m.All(char.IsDigit));

如果您只想要数值。

Answer 2

如何使用正则表达式呢？

scanLeft

在特定单词后提取单词

2 个答案: