Question

我有一些XML，我坚持使用。

我不确定如何从这种格式中检索。我考虑生成一个整数并根据字符串连接执行循环，但我希望有人做了类似的事情并找到了更聪明的解决方案。

XML

<TRANSACTION>
  <!-- Not ideal, but fairly straight forward. -->
  <itema></itema>
  <itemb></itemb>
  <itemtypea></itemtypea>
  <itemtypeb></itemtypeb>
  <itemid></itemid>
  <itemlabeltypea></itemlabeltypea>
  <itemlabeltypeb></itemlabeltypeb>
  <savenewitema></savenewitema>
  <savenewitemb></savenewitemb>
  <!-- One to Many Inserts: Insert0, Insert1, etc. -->
  <Insert0></Insert0>
  <InsertinItem0></InsertinItem0>
  <!-- One to Many Deletes: Again, seriously? -->
  <Delete0></Delete0>
  <DeletefromItem0></DeletefromItem0>
  <!-- One to Many Updates: Why? -->
  <Update0></Update0>
  <UpdateinItem0></UpdateinItem0>
</TRANSACTION>

Linq to XML

// Create data object from XML.
var data = (from item in xmlDoc.Descendants("TRANSACTION")
            select new
            {
                // Pseudo code, this will undesirably retrieve Delete0 and DeletefromItem0 as separate records.
                // Perhaps a join is necessary and I need to filter out DeletefromItem0 from the left hand table?
                // Are there any obvious solutions I may have missed?
                DeleteFromItems = from e in item.Elements().Where(x => x.Name.LocalName.StartsWith("Delete"))
                                  select new
                                  {
                                      ItemId = default(int), // Would ideally contain DeletefromItem0.
                                      UniqueId = e.Value
                                  },
                InsertIntoItems = from e in item.Elements().Where(x => x.Name.LocalName.StartsWith("Insert"))
                                  select new
                                  {
                                      ItemId = default(int),
                                      UniqueId = e.Value
                                  },
                ItemId = item.Element("itemid").Value,
                PrimaryItem = new
                {
                    Id = Int32.Parse(item.Element("itema").Value),
                    IsNew = Boolean.Parse(item.Element("savenewitema").Value),
                    LabelType = item.Element("itemlabeltypea").Value,
                    Type = item.Element("itemtypea").Value
                },
                SecondaryItem = new
                {
                    Id = Int32.Parse(item.Element("itemb").Value),
                    IsNew = Boolean.Parse(item.Element("savenewitemb").Value),
                    LabelType = item.Element("itemlabeltypeb").Value,
                    Type = item.Element("itemtypeb").Value
                }
            }).First();

Answer 1

在尝试使用文档之前，您应该自己做一件大事并清理文档。您可以在XSLT中执行此操作，但您可能会遇到困难。幸运的是，使用优质的LINQ并不是最糟糕的。

虽然不是绝对必要，但最好跟踪任何已清理过的元素，以确保它们不会被多次处理。

public static class XmlSanitizer
{
    static XNamespace NS => "urn:example:sanitizer";
    internal static XName IndexName => NS + "Index";
    internal static XName SanitizedName => NS + "Sanitized";

    public static void Sanitize(XDocument doc, params string[] patterns)
    {
        if (!HasSanitzerNamespace(doc))
            doc.Root.Add(new XAttribute(XNamespace.Xmlns + "s", NS.NamespaceName));

        foreach (var pattern in patterns)
        {
            var nodes =
                (from e in doc.Root.Elements()
                let m = Regex.Match(e.Name.LocalName, pattern)
                where m.Success
                let sanitized = (bool?)e.Attribute(SanitizedName)
                where !(sanitized ?? false)
                select new
                {
                    Element = e,
                    Namespace = e.Name.Namespace,
                    LocalName = m.Groups[1].Value,
                    Index = m.Groups[2].Value,
                }).ToList();
            foreach (var x in nodes)
            {
                // it might be preferrable to place the new elements within a grouping element
                x.Element.ReplaceWith(
                    new XElement(x.Namespace + x.LocalName,
                        new XAttribute(IndexName, x.Index),
                        new XAttribute(SanitizedName, true),
                        x.Element.Attributes(),
                        x.Element.Nodes()
                    )
                );
            }
        }
    }

    static bool HasSanitzerNamespace(XDocument doc) =>
        (from a in doc.Root.Attributes()
        where a.Name.Namespace == XNamespace.Xmlns
        where (string)a == NS.NamespaceName
        select a).Any();
}

public static class XmlStanitizerExtensions
{
    static XName IndexName => XmlSanitizer.IndexName;
    public static XElement ElementIndex(this XElement e, XName name, string index) => e.Elements(name).Where(n => (string)n.Attribute(IndexName) == index).Single();
}

然后进行清理，将名称的正则表达式传递给组

XmlSanitizer.Sanitize(doc, new string[]
{
    @"(item)([ab])",
    @"(itemtype)([ab])",
    @"(itemlabeltype)([ab])",
    @"(savenewitem)([ab])",
    @"(Insert)(\d+)",
    @"(InsertinItem)(\d+)",
    @"(Delete)(\d+)",
    @"(DeletefromItem)(\d+)",
    @"(Update)(\d+)",
    @"(UpdateinItem)(\d+)",
});

这会给你这样的东西：

<TRANSACTION xmlns:s="urn:example:sanitizer">
  <!-- Not ideal, but fairly straight forward. -->
  <item s:Index="a" s:Sanitized="true" />
  <item s:Index="b" s:Sanitized="true" />
  <itemtype s:Index="a" s:Sanitized="true" />
  <itemtype s:Index="b" s:Sanitized="true" />
  <itemid></itemid>
  <itemlabeltype s:Index="a" s:Sanitized="true" />
  <itemlabeltype s:Index="b" s:Sanitized="true" />
  <item s:Index="a" s:Sanitized="true" />
  <item s:Index="b" s:Sanitized="true" />
  <!-- One to Many Inserts: Insert0, Insert1, etc. -->
  <Insert s:Index="0" s:Sanitized="true" />
  <InsertinItem s:Index="0" s:Sanitized="true" />
  <!-- One to Many Deletes: Again, seriously? -->
  <Delete s:Index="0" s:Sanitized="true" />
  <DeletefromItem s:Index="0" s:Sanitized="true" />
  <!-- One to Many Updates: Why? -->
  <Update s:Index="0" s:Sanitized="true" />
  <UpdateinItem s:Index="0" s:Sanitized="true" />
</TRANSACTION>

至少有了这个，处理会更容易。

var data =
    (from t in doc.Elements("TRANSACTION")
    select new
    {
        // assuming the indices are sequential
        DeleteFromItems = t.Elements("Delete").Zip(t.Elements("DeletefromItem"), (d, dfi) => new
            {
                ItemId = (int)dfi, // assuming there's a value
                UniqueId = (string)d,
            }).ToList(),
        InsertIntoItems = t.Elements("Insert").Zip(t.Elements("InsertinItem"), (i, iii) => new
            {
                ItemId = (int)iii, // assuming there's a value
                UniqueId = (string)i,
            }).ToList(),
        UpdateIntoItems = t.Elements("Update").Zip(t.Elements("UpdateinItem"), (u, uii) => new
            {
                ItemId = (int)uii, // assuming there's a value
                UniqueId = (string)u,
            }).ToList(),
        ItemId = (string)t.Element("itemid"),
        PrimaryItem = new
        {
            Id = (int)t.ElementIndex("item", "a"),
            IsNew = (bool)t.ElementIndex("savenewitem", "a"),
            LabelType = (string)t.ElementIndex("itemlabeltype", "a"),
            Type = (string)t.ElementIndex("itemtype", "a"),
        },
        SecondaryItem = new
        {
            Id = (int)t.ElementIndex("item", "b"),
            IsNew = (bool)t.ElementIndex("savenewitem", "b"),
            LabelType = (string)t.ElementIndex("itemlabeltype", "b"),
            Type = (string)t.ElementIndex("itemtype", "b"),
        },
    }).Single();

此外，我会在卫生阶段对相应的元素进行分组，以使处理更加容易。那么你就不必对数据做出很多假设。我将此作为学习练习留给你。

Answer 2

如果我正确理解您的问题，这种扩展方法可能会使这种数据更容易处理：

public static IEnumerable<XElement> EnumerateGroup(this XElement source, string groupName)
{
    return source.Elements()
        .Where(element => Regex.IsMatch(element.Name.LocalName, "^" + groupName + "[a-z0-9]*$"));
}

用作：

XElement xml = XElement.Parse(xmlString);
var results = xml.EnumerateGroup("savenewitem"); // savenewitema, savenewitemb

该方法枚举了所有子元素，但正则表达式（如果你不熟悉它本身就是一个主题，虽然这里有很多好的资源）只会返回与组名完全匹配的那些。或者，如果最后有一个额外的字符（通过查看a，b 0等的示例 - 如果你有更大的数字，你可能需要扩展它！）。

Linq to XML，从类似名称元素中检索列表

2 个答案: