如何在Xelement中找出重复的元素

时间:2016-11-09 04:46:21

标签: c# xml linq-to-xml xelement

我试图找出XElement中的重复元素,并创建一个通用函数来删除重复项。类似于:

 public List<Xelement>RemoveDuplicatesFromXml(List<Xelement> xele)
{ // pass the Xelement List in the Argument and get the List back , after deleting the  duplicate entries.                                       
  return xele;
}

xml如下:

<Execute ID="7300" Attrib1="xyz"    Attrib2="abc" Attrib3="mno" Attrib4="pqr" Attrib5="BCD" />
<Execute ID="7301" Attrib1="xyz"    Attrib2="abc" Attrib3="mno" Attrib4="pqr" Attrib5="BCD" />
<Execute ID="7302" Attrib1="xyz1"    Attrib2="abc" Attrib3="mno" Attrib4="pqr" Attrib5="BCD" />

我想在除ID之外的每个属性上获取重复项,然后删除ID较小的那个。

谢谢,

2 个答案:

答案 0 :(得分:1)

您可以为此任务实施自定义IEqualityComparer

class XComparer : IEqualityComparer<XElement>
{
    public IList<string> _exceptions;
    public XComparer(params string[] exceptions)
    {
        _exceptions = new List<string>(exceptions);
    }

    public bool Equals(XElement a, XElement b)
    {
        var attA = a.Attributes().ToList();
        var attB = b.Attributes().ToList();

        var setA = AttributeNames(attA);
        var setB = AttributeNames(attB);

        if (!setA.SetEquals(setB))
        {
            return false;
        }

        foreach (var e in setA)
        {
            var xa = attA.First(x => x.Name.LocalName == e);
            var xb = attB.First(x => x.Name.LocalName == e);

            if (xa.Value == null && xb.Value == null)
                continue;

            if (xa.Value == null || xb.Value == null)
                return false;

            if (!xa.Value.Equals(xb.Value))
            {
                return false;
            }
        }

        return true;
    }

    private HashSet<string> AttributeNames(IList<XAttribute> e)
    {
        return new HashSet<string>(e.Select(x =>x.Name.LocalName).Except(_exceptions));
    }

    public int GetHashCode(XElement e)
    {
        var h = 0;

        var atts = e.Attributes().ToList();
        var names = AttributeNames(atts);

        foreach (var a in names)
        {
            var xa = atts.First(x => x.Name.LocalName == a);

            if (xa.Value != null)
            {
                h = h ^ xa.Value.GetHashCode();
            }           
        }

        return h;
    }
}

用法:

var comp = new XComparer("ID");
var distXEle = xele.Distinct(comp);

请注意,此答案中的IEqualityComparer实施只会比较LocalName,并且不会考虑名称空间。如果您的元素具有重复的本地名称属性,则此实现将采用第一个。

您可以在此处查看演示:https://dotnetfiddle.net/w2DteS

修改

如果你想

  

删除ID较小的那个

这意味着您需要最大的ID,然后您可以使用.Distinct链接.Select来电。

var comp = new XComparer("ID");
var distXEle = xele
    .Distinct(comp)
    .Select(z => xele
        .Where(a => comp.Equals(z, a))
        .OrderByDescending(a => int.Parse(a.Attribute("ID").Value))
        .First()
    );

它将保证您获得具有最大ID的元素。

答案 1 :(得分:1)

使用Linq GroupBy

var doc = XDocument.Parse(yourXmlString);
var groups = doc.Root
                .Elements()
                .GroupBy(element => new
                {
                    Attrib1 = element.Attribute("Attrib1").Value,
                    Attrib2 = element.Attribute("Attrib2").Value,
                    Attrib3 = element.Attribute("Attrib3").Value,
                    Attrib4 = element.Attribute("Attrib4").Value,
                    Attrib5 = element.Attribute("Attrib5").Value
                });

var duplicates = group1.SelectMany(group => 
{
    if(group.Count() == 1) // remove this if you want only duplicates
    {
        return group;
    }

    int minId = group.Min(element => int.Parse(element.Attribute("ID").Value));
    return group.Where(element => int.Parse(element.Attribute("ID").Value) > minId);
});

上面的解决方案将删除具有较小ID的元素,这些元素具有属性重复的元素 如果只想返回具有重复项的元素,则从最后一个lambda

中删除if fork