Question

目标：
找到句子＆＃34;从今天的特色文章＆＃34;来自网站＆＃34; http://en.wikipedia.org/wiki/Main_Page＆＃34;使用带有C＃代码的webscape。

问题：
您可以在字符串值中检索网站的源代码。我相信你可以找到句子＆＃34;从今天的特色文章＆＃34;通过循环使用子字符串。我觉得这是一种效率低下的方法。

是否有更好的解决方案来定位句子＆＃34;从今天的特色文章＆＃34;从字符串输入？

的信息：
*我在Visual Studio 2013社区中使用C＃代码 *源代码无法正常工作。在前三行正在运作。

WebClient w = new WebClient();

string s = w.DownloadString("http://en.wikipedia.org/wiki/Main_Page");

string svar = RegexUtil.MatchKey(input);




static class RegexUtil
{
    static Regex _regex = new Regex(@"$ddd$");
    /// <summary>
    /// This returns the key that is matched within the input.
    /// </summary>
    static public string MatchKey(string input)
    {
        //Match match = Regex.Match(input, @"From today's featured article", RegexOptions.IgnoreCase);

        Match match = _regex.Match(input);
        //  Match match = regex.Match("Dot 55 Perls");


        if (match.Success)
        {
            return match.Groups[1].Value;
        }
        else
        {
            return null;
        }
    }
}

Answer 1

如果要查找该字符串的出现位置，您需要做的就是：

int pos = html.IndexOf("From today's featured article");

但是，您应该注意，这可以在引号或标记中找到字符串，而不仅仅是来自可见文本。

为了仅搜索可见文本，您需要解析HTML以删除所有标记，然后在文本之间搜索。

Webscraping的更好解决方案

1 个答案: