从API获取XML并将其存储在本地的最有效方法是什么?

时间:2014-05-01 21:35:17

标签: c#

我试图找到从merriam webster字典中读取XML的最快方法,并将其存储到本地文件以供以后使用。下面,我尝试实现一个模块,它做了一些事情:

  1. 从本地目录中读取2000个单词
  2. 使用API​​
  3. 查找merriam字典中的每个单词
  4. 将定义存储在本地XML中供以后使用。
  5. 我不确定制作XML是否是存储此数据的最佳方式,但这似乎是最简单的事情。起初,我以为我会在不同的步骤中做到这一点。 (1.查找单词,将单词和定义存储到数据结构中.2。将所有数据转储为XML。)然而,这会产生一个问题,因为它存储在运行时(调用)堆栈上的东西太多了。

    因此,在这种情况下,我尝试通过查找每个单词然后逐个将其保存到xml来加快速度。然而,这也是一种缓慢的方法。它每500-600个单词大约需要10分钟。

    public void load_module() // stores words/definitions into xml file
        { // 1. Pick up word from text file     2. Look up word's definition    3. Store in Xml 
            string workdirect = Directory.GetCurrentDirectory();
            workdirect = workdirect.Substring(0, workdirect.LastIndexOf("bin"));
            workdirect += "words1.txt";
            using (StreamReader read = new StreamReader(workdirect)) // 1. Pick up word from text file 
            {
                while (!read.EndOfStream)
                {
                    string line = read.ReadLine(); 
                    var definitions = load(line.ToLower());    // 2. Retrieve Words Definitions
    
                    store_xml(line, definitions);
                    wordlist.Add(line);
                }
            }
        }
    
        public List<string> load(string word)
        {
            XmlDocument doc = new XmlDocument();
    
            List<string> definitions = new List<string>();
            XmlNodeList node = null;
    
            doc.Load("http://www.dictionaryapi.com/api/v1/references/collegiate/xml/"+word+"?key=*****************"); // Asteriks to hide the actual API key
    
            if (doc.SelectSingleNode("entry_list").SelectSingleNode("entry").SelectSingleNode("def") == null)
            {
                return definitions;
            }
            node = doc.SelectSingleNode("entry_list").SelectSingleNode("entry").SelectSingleNode("def").SelectNodes("dt");
    
            // TO DO : implement definitions if there is no node "def" in first node entry "entry_list"
    
            foreach (XmlNode item in node)
            {
                definitions.Add(item.InnerXml.ToString().ToLower());
            }
    
    
            return definitions;
    
        }
    
        public void store_xml(string word, List<string> definitions)
        {
            string local = Directory.GetCurrentDirectory();
            string name = "dictionary_word.xml";
            local = local.Substring(0, local.LastIndexOf("bin"));
            bool exists = File.Exists(local + name);
    
            if (exists)
            {
                XmlDocument doc = new XmlDocument();
                doc.Load(local + name);
                XmlElement wordindoc = doc.CreateElement("Word");
                wordindoc.SetAttribute("xmlns", word);
                XmlElement defs = doc.CreateElement("Definitions");
                foreach (var item in definitions)
                {
                    XmlElement def = doc.CreateElement("Definition");
                    def.InnerText = item;
                    defs.AppendChild(def);
                }
                wordindoc.AppendChild(defs);
                doc.DocumentElement.AppendChild(wordindoc);
                doc.Save(local+name);
            }
            else
            {
                using (XmlWriter writer = XmlWriter.Create(@local + name))
                {
                    writer.WriteStartDocument();
    
                    writer.WriteStartElement("Dictionary");
    
                    writer.WriteStartElement("Word", word);
    
                    writer.WriteStartElement("Definitions");
                    foreach (var def in definitions)
                    {
                        writer.WriteElementString("Definition", def);
                    }
                    writer.WriteEndElement();
                    writer.WriteEndElement();
    
                    writer.WriteEndElement();
                    writer.WriteEndDocument();
                }
            }           
        }
    }
    

1 个答案:

答案 0 :(得分:0)

当处理需要导出到XML的大量数据时,我通常将数据作为自定义对象的集合而不是XMLDocument保存在内存中:

public class Definition
{
    public string Word { get; set; }
    public string Definition { get; set; }
}

然后我会使用XMLWriter将集合写入XML文件:

XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = ("    ");
settings.Encoding = Encoding.UTF8;
using (XmlWriter writer = XmlWriter.Create("C:\output\output.xml", settings))
{
    writer.WriteStartDocument();
    // TODO - use XMLWriter functions to write out each word and definition
    writer.Flush();
}

如果您的内存仍然不足,您可以批量写出XML(例如,每500个定义)。

我在Improving XML Performance上发现Microsoft文章是一个非常有用的参考,特别是有关设计注意事项的部分。