我试图找到从merriam webster字典中读取XML的最快方法,并将其存储到本地文件以供以后使用。下面,我尝试实现一个模块,它做了一些事情:
我不确定制作XML是否是存储此数据的最佳方式,但这似乎是最简单的事情。起初,我以为我会在不同的步骤中做到这一点。 (1.查找单词,将单词和定义存储到数据结构中.2。将所有数据转储为XML。)然而,这会产生一个问题,因为它存储在运行时(调用)堆栈上的东西太多了。
因此,在这种情况下,我尝试通过查找每个单词然后逐个将其保存到xml来加快速度。然而,这也是一种缓慢的方法。它每500-600个单词大约需要10分钟。
public void load_module() // stores words/definitions into xml file
{ // 1. Pick up word from text file 2. Look up word's definition 3. Store in Xml
string workdirect = Directory.GetCurrentDirectory();
workdirect = workdirect.Substring(0, workdirect.LastIndexOf("bin"));
workdirect += "words1.txt";
using (StreamReader read = new StreamReader(workdirect)) // 1. Pick up word from text file
{
while (!read.EndOfStream)
{
string line = read.ReadLine();
var definitions = load(line.ToLower()); // 2. Retrieve Words Definitions
store_xml(line, definitions);
wordlist.Add(line);
}
}
}
public List<string> load(string word)
{
XmlDocument doc = new XmlDocument();
List<string> definitions = new List<string>();
XmlNodeList node = null;
doc.Load("http://www.dictionaryapi.com/api/v1/references/collegiate/xml/"+word+"?key=*****************"); // Asteriks to hide the actual API key
if (doc.SelectSingleNode("entry_list").SelectSingleNode("entry").SelectSingleNode("def") == null)
{
return definitions;
}
node = doc.SelectSingleNode("entry_list").SelectSingleNode("entry").SelectSingleNode("def").SelectNodes("dt");
// TO DO : implement definitions if there is no node "def" in first node entry "entry_list"
foreach (XmlNode item in node)
{
definitions.Add(item.InnerXml.ToString().ToLower());
}
return definitions;
}
public void store_xml(string word, List<string> definitions)
{
string local = Directory.GetCurrentDirectory();
string name = "dictionary_word.xml";
local = local.Substring(0, local.LastIndexOf("bin"));
bool exists = File.Exists(local + name);
if (exists)
{
XmlDocument doc = new XmlDocument();
doc.Load(local + name);
XmlElement wordindoc = doc.CreateElement("Word");
wordindoc.SetAttribute("xmlns", word);
XmlElement defs = doc.CreateElement("Definitions");
foreach (var item in definitions)
{
XmlElement def = doc.CreateElement("Definition");
def.InnerText = item;
defs.AppendChild(def);
}
wordindoc.AppendChild(defs);
doc.DocumentElement.AppendChild(wordindoc);
doc.Save(local+name);
}
else
{
using (XmlWriter writer = XmlWriter.Create(@local + name))
{
writer.WriteStartDocument();
writer.WriteStartElement("Dictionary");
writer.WriteStartElement("Word", word);
writer.WriteStartElement("Definitions");
foreach (var def in definitions)
{
writer.WriteElementString("Definition", def);
}
writer.WriteEndElement();
writer.WriteEndElement();
writer.WriteEndElement();
writer.WriteEndDocument();
}
}
}
}
答案 0 :(得分:0)
当处理需要导出到XML的大量数据时,我通常将数据作为自定义对象的集合而不是XMLDocument保存在内存中:
public class Definition
{
public string Word { get; set; }
public string Definition { get; set; }
}
然后我会使用XMLWriter将集合写入XML文件:
XmlWriterSettings settings = new XmlWriterSettings();
settings.Indent = true;
settings.IndentChars = (" ");
settings.Encoding = Encoding.UTF8;
using (XmlWriter writer = XmlWriter.Create("C:\output\output.xml", settings))
{
writer.WriteStartDocument();
// TODO - use XMLWriter functions to write out each word and definition
writer.Flush();
}
如果您的内存仍然不足,您可以批量写出XML(例如,每500个定义)。
我在Improving XML Performance上发现Microsoft文章是一个非常有用的参考,特别是有关设计注意事项的部分。