将特定的HTML结构转换为特定的XML结构

时间:2014-11-04 15:08:26

标签: c# html xml recursion html-agility-pack

我一直在努力解决这个问题。

我想将html转换为xml。结构如下所示。

我使用“HtmlAgilityPack”将html转换为有效的xml结构。所以,在此之后,我的HTML看起来像这样:

<div class="menuItem1" video="" preview="">
    Menu 1
    <div class="subMenu1">
        <div class="menuItem2" video="" preview="">
            Menu 2
            <div class="subMenu2">
                <div class="menuItem3" video="" preview="">
                    Menu 3
                    <div class="subMenu3">
                        <div class="" video="" preview="">Menu 4</div>
                    </div>
                    <div class="treeExpand"></div>
                </div>
                <div class="menuItem3" video="" preview="">Menu 3</div>
                <div class="menuItem3" video="" preview="">Menu 3</div>
            </div>
            <div class="treeExpand"></div>
        </div>
    </div>
    <div class="treeExpand"></div>
</div>
<div class="menuItem1" video="" preview="">
    Menu 1
    <div class="subMenu1">
        <div class="menuItem2" video="" preview="">
            Menu 2
            <div class="subMenu2">
                <div class="menuItem3" video="" preview="">
                    Menu 3
                    <div class="subMenu3">
                        <div class="" video="" preview="">Menu 4</div>
                    </div>
                    <div class="treeExpand"></div>
                </div>
                <div class="menuItem3" video="" preview="">Menu 3</div>
                <div class="menuItem3" video="" preview="">Menu 3</div>
            </div>
            <div class="treeExpand"></div>
        </div>
    </div>
    <div class="treeExpand"></div>
</div>

这正是我想要的。现在我可以使用这个C#代码将其转换为XElement:

XDocument doc = XDocument.Parse(THE_HTML_STRING_AS_SHOWN_ABOVE);
XDocument docw = new XDocument(new XElement("Navigation", doc.Root));
XElement root = docw.Root;

我创建了一个方法,我可以将root传递给:

GenerateXmlFromHtml(root);

此方法的代码:

private string GenerateXmlFromHtml(XElement elem)
{
    StringBuilder sbNavigationXml = new StringBuilder();
    try
    {
        //HTML will always have a video and preview, according to the generation of the html structure.

        string text = string.Empty;
        string videopath = string.Empty;
        string previewpath = string.Empty;
        XText textNode;

        foreach (XElement element in elem.Elements())
        {
            element.Name = "MenuItem"; //Change element name.

            string htmlClass;
            try { htmlClass = element.Attribute("class").Value; }
            catch { htmlClass = ""; }

            if (!string.IsNullOrEmpty(htmlClass))
            {
                if (htmlClass.Contains("subMenu"))
                {
                    element.AddBeforeSelf(element.Elements());
                    element.Remove();
                    GenerateXmlFromHtml(element);
                }
                else if (htmlClass.Contains("menuItem"))
                {
                    textNode = element.Nodes().OfType<XText>().FirstOrDefault();
                    text = textNode.Value;
                    videopath = element.Attribute("video").Value;
                    previewpath = element.Attribute("preview").Value;

                    if (element.HasElements)
                    {
                        sbNavigationXml.AppendLine("<MenuItem Text=\"" + text + "\" VideoPath=\"" + videopath + "\" PreviewPath=\"" + previewpath + "\">");
                        sbNavigationXml.AppendLine(GenerateXmlFromHtml(element));
                        sbNavigationXml.AppendLine("</MenuItem>");
                    }
                    else
                    {
                        sbNavigationXml.AppendLine("<MenuItem Text=\"" + text + "\" VideoPath=\"" + videopath + "\" PreviewPath=\"" + previewpath + "\" />");
                    }
                }
                else if (htmlClass.Contains("treeExpand"))
                {
                    element.AddBeforeSelf(element.Elements());
                    element.Remove();
                    GenerateXmlFromHtml(element);
                }
            }
            else
            {
                element.AddBeforeSelf(element.Elements());
                element.Remove();
                GenerateXmlFromHtml(element);
            }
        }
    }
    catch (Exception)
    {
        throw;
    }
    return sbNavigationXml.ToString();
}

最后,我希望这能产生这个XML输出:

<Navigation>
  <MenuItem Text="Menu 1" VideoPath="" PreviewPath="">
    <MenuItem Text="Menu 2">
      <MenuItem Text="Menu 3">
        <MenuItem Text="Menu 4" VideoPath="" PreviewPath="" />
      </MenuItem>
      <MenuItem Text="Menu 3" />
      <MenuItem Text="Menu 3" />
    </MenuItem>
  </MenuItem>
  <MenuItem Text="Menu 1" VideoPath="" PreviewPath="">
    <MenuItem Text="Menu 2">
      <MenuItem Text="Menu 3">
        <MenuItem Text="Menu 4" VideoPath="" PreviewPath="" />
      </MenuItem>
      <MenuItem Text="Menu 3" />
      <MenuItem Text="Menu 3" />
    </MenuItem>
  </MenuItem>
</Navigation>

换句话说,子菜单应该掉落,以及树扩展div,然后我想生成XML,但此刻,我仍然悲惨地失败。请问是否有问题。任何帮助赞赏!

=============================================== ================================================== ==

编辑: 固定的递归方法,适用于任何想要查看的人:

private string GenerateXmlFromHtml(XElement elem)
{
    //HTML will always have a video and preview, according to the generation of the html structure.
    StringBuilder sbNavigationXml = new StringBuilder();
    string text = string.Empty;
    string videopath = string.Empty;
    string previewpath = string.Empty;
    XText textNode;

    try
    {
        foreach (XElement element in elem.Elements())
        {
            //element.Name = "MenuItem"; //Change element name.
            string htmlClass;
            try { htmlClass = element.Attribute("class").Value; }
            catch { htmlClass = ""; }

            if (!string.IsNullOrEmpty(htmlClass))
            {
                if (htmlClass.Contains("subMenu"))
                {
                    if (element.HasElements)
                    {
                        sbNavigationXml.AppendLine(GenerateXmlFromHtml(element));
                    }
                }
                else if (htmlClass.Contains("menuItem"))
                {
                    textNode = element.Nodes().OfType<XText>().FirstOrDefault(); //Get node Text attribute value.
                    text = textNode.Value;
                    videopath = element.Attribute("video").Value; //Get node VideoPath attribute value.
                    previewpath = element.Attribute("preview").Value; //Get node PreviewPath attribute value.

                    if (element.HasElements)
                    {
                        sbNavigationXml.AppendLine("<MenuItem Text=\"" + text + "\" VideoPath=\"" + videopath + "\" PreviewPath=\"" + previewpath + "\">");
                        sbNavigationXml.AppendLine(GenerateXmlFromHtml(element));
                        sbNavigationXml.AppendLine("</MenuItem>");
                    }
                    else
                    {
                        sbNavigationXml.AppendLine("<MenuItem Text=\"" + text + "\" VideoPath=\"" + videopath + "\" PreviewPath=\"" + previewpath + "\" />");
                    }
                }
                else if (htmlClass.Contains("treeExpand"))
                {
                    //DO NOTHING
                }
            }
            else
            {
                if (element.HasElements)
                {
                    sbNavigationXml.AppendLine(GenerateXmlFromHtml(element));
                }
            }
        }
    }
    catch (Exception)
    {
        throw;
    }
    return sbNavigationXml.ToString();
}

1 个答案:

答案 0 :(得分:1)

尝试分离不同文档的输入和输出。

然后导航输入并开始以您想要的格式将其输出到您的XmlDocument输出(另一个变量)。

像...一样的东西。

class Converter
{
    public XmlDocument Convert(XmlDocument inputDocument)
    {
        XmlDocument result = new XmlDocument();
        ConvertNode(inputDocument.DocumentElement, result.DocumentElement, result);
        return result;
    }

    public void ConvertNode(XmlNode inputNode, XmlNode outputNode, XmlDocument outputDoc)
    {
        XmlNode newNode = null;

        // check elemment class
        string htmlClass;
        try { htmlClass = inputNode.Attributes["class"].Value; }
        catch { htmlClass = ""; }

        if(!string.IsNullOrWhiteSpace(htmlClass))
        {
            if (htmlClass.Contains("menuItem"))
            {
                newNode = outputDoc.CreateElement("MenuItem");
                outputNode.AppendChild(newNode);
            }

            /// check other wanted nodes etc..
        }

        if (newNode != null)
        {
            foreach (XmlNode node in inputNode.ChildNodes)
            {
                ConvertNode(node, newNode, outputDoc);
            }
        }
    }
}