从网站获取数据,阅读和解析内容 - 改进方法

时间:2013-10-16 12:15:39

标签: c# parsing web feedback

我创建了一个方法,使用thesaurus.com查找单词的同义词,我正在寻找评论和反馈。无论在速度,安全性,可靠性方面(无论依靠第三方网站进行查询的“可靠性”)等,我都能以何种方式改进它。

    /// <summary>
    /// This method relies heavily on thesaurus.com for synonym lookups. It is not completely reliable, but is deemed reliable enough in instances where you dont have your own thesaurus
    /// </summary>
    public static string[] GetSynonyms(string word)
    {
        string url = string.Format("http://thesaurus.com/search?q={0}", word);

        HttpWebRequest request = (HttpWebRequest)WebRequest.Create(url);
        HttpWebResponse response = (HttpWebResponse)request.GetResponse();
        if (response.StatusCode == HttpStatusCode.OK)
        {
            List<string> synonyms = new List<string>();
            StringBuilder data = new StringBuilder();
            string line;

            using (StreamReader reader = new StreamReader(response.GetResponseStream()))
            {

                //we know that the synonyms is in the upper-part of the html stream so we do not want to read the entire stream.
                while((line = reader.ReadLine()) != null) {
                    var index = line.IndexOf("<span class=\"text\">");

                    if(index > 0) 
                    {
                        index = index + "<span class=\"text\">".Length;
                        synonyms.Add(line.Substring(index).Replace("</span>", ""));
                    }

                    //break when we come to the Antonyms section of the page
                    if (line.Contains("container-info antonyms"))
                    {
                        break;
                    }
                }
            }
            return synonyms.ToArray<string>();
        }
        else
        {
            return null;
        }
    }

编辑:例如,现在需要大约3.5秒来查找单词“old”的同义词。

1 个答案:

答案 0 :(得分:1)

改进的最佳方法是使用适合作业的内容而不是解析HTML。即本地或webservices API,如