即使使用自定义EqualityComparer,不同的选择仍会留下重复的条目

时间:2016-02-04 12:41:16

标签: c# floating-point mono precision iequalitycomparer

我对这个剧本感到很遗憾 - 我不明白 - 为什么会留下重复的条目?

private static float GenerateMedian(IEnumerable<Collider> items, KDAxis axis)
{
    float[] allValues = items.SelectMany(AxisSelector(axis)).ToArray();
    Debug.LogFormat("{0} all values for {1} items: {2}.", allValues.Length, items.Count(), string.Join(", ", allValues.Select(v => v.ToString("F10")).ToArray()));
    #if BASIC_DISTINCT
    float[] values = allValues.Distinct().OrderBy(f => f).ToArray();
    #else
    float[] values = allValues.Distinct(new KDFloatComparer(0.0001f)).OrderBy(f => f).ToArray();
    #endif
    Debug.LogFormat("{0} distinct values for {1} items: {2}.", values.Length, items.Count(), string.Join(", ", values.Select(v => v.ToString("F10")).ToArray()));

    int medianIndex = Mathf.CeilToInt(values.Length / 2f) - 1;
    float medianValue = values[medianIndex];

    Debug.LogFormat("Median index: {0} (left: {1}; right: {2}) value: {3}", medianIndex, medianIndex + 1, values.Length - 1 - medianIndex, medianValue);

    return medianValue;
}

private static Func<Collider, IEnumerable<float>> AxisSelector(KDAxis axis)
{
    switch (axis)
    {
        case KDAxis.X:
            return XAxisSelector;

        case KDAxis.Y:
            return YAxisSelector;

        case KDAxis.Z:
            return ZAxisSelector;
    }

    return XAxisSelector;
}

private static IEnumerable<float> XAxisSelector(Collider collider)
{
    yield return collider.bounds.max.x;
    yield return collider.bounds.min.x;
}

private static IEnumerable<float> YAxisSelector(Collider collider)
{
    yield return collider.bounds.max.y;
    yield return collider.bounds.min.y;
}

private static IEnumerable<float> ZAxisSelector(Collider collider)
{
    yield return collider.bounds.max.z;
    yield return collider.bounds.min.z;
}

提供此输出:

  

28个项目的所有值:3.0000000000,20000000000,1.0000000000,-11.0000000000,-5.0000010000,-10.0000000000,30000000000,2.0000000000,30000000000,20000000000,11.0000000000,-11.0000000000,-10.0000000000,-11.0000400000,30000000000,2.0000000000,7.0000000000, 6.0000000000,-7.0000000000,-10.0000000000,10.0000000000,-10.0000000000,11.0000000000,9.9999550000,-8.0000000000,-9.9999980000,30000000000,2.0000000000。
  14个项目的20个不同值:-11.0000400000,-11.0000000000,-10.0000000000,-10.0000000000,-9.9999980000,-8.0000000000,-7.0000000000,-5.0000010000,20000000000,2.0000000000,20000000000,30000000000,30000000000,30000000000,60000000000,7.0000000000,9.9999550000,10.0000000000 ,11.0000000000,11.0000000000。

它显然包含重复项 - 例如3 x 2.03 x 3.0

即使我要实现自定义浮动比较器,并使用Distinct()将其反馈到new KDFloatComparer(0.0001f)

public class KDFloatComparer : EqualityComparer<float>
{
    public readonly float InternalEpsilon = 0.001f;

    public KDFloatComparer(float epsilon) : base()
    {
        InternalEpsilon = epsilon;
    }

    // http://stackoverflow.com/a/31587700/393406
    public override bool Equals(float a, float b)
    {
        float absoluteA = Math.Abs(a);
        float absoluteB = Math.Abs(b);
        float absoluteDifference = Math.Abs(a - b);

        if (a == b) 
        {
            return true;
        } 
        else if (a == 0 || b == 0 || absoluteDifference < float.Epsilon) 
        {
            // a or b is zero or both are extremely close to it.
            // Relative error is less meaningful here.
            return absoluteDifference < InternalEpsilon;
        } 
        else 
        { 
            // Use relative error.
            return absoluteDifference / (absoluteA + absoluteB) < InternalEpsilon;
        }

        return true;
    }

    public override int GetHashCode(float value)
    {
        return value.GetHashCode();
    }
}

结果完全一样。

我确实尝试在csharppad.com上复制该方案 - 它没有留下重复项。虽然,我没有使用SelectMany方法,但是我使用报告的ToString("F10")值创建了原始数组,这让我觉得问题在于浮点精度,但是,无论我如何已经实现了EqualityComparer(在尝试使用SO之前有一些自定义变体),我似乎无法指出它。

我该如何解决这个问题?

2 个答案:

答案 0 :(得分:2)

你的Equals被破坏了,因为它不满足三角不等式。必须是a == b && b == c ==> a == c。由于epsilon的比较,情况并非如此。

真的,这没有意义。如果你有数字new [] { 0, epsilon, epsilon * 2 },你想保留这三个数字中的哪一个?!您需要更好地定义它并使用不同的算法。

当您违反EqualsGetHashCode的合同时,您会得到未定义的行为。

另一个问题是,一些具有不相等哈希码的值将在这里比较相等。

  

我确实尝试在csharppad.com上复制场景 - 它没有留下重复

未定义的行为有时意味着获得正确的结果。

答案 1 :(得分:1)

我创建了一个小型控制台项目来测试它:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace TestEqual
{
    class Program
    {

        static float[] values = new float[] { 3.0000000000f, 2.0000000000f, 11.0000000000f, -11.0000000000f, -5.0000010000f, -10.0000000000f, 3.0000000000f, 2.0000000000f, 3.0000000000f, 2.0000000000f, 11.0000000000f, -11.0000000000f, -10.0000000000f, -11.0000400000f, 3.0000000000f, 2.0000000000f, 7.0000000000f, 6.0000000000f, -7.0000000000f, -10.0000000000f, 10.0000000000f, -10.0000000000f, 11.0000000000f, 9.9999550000f, -8.0000000000f, -9.9999980000f, 3.0000000000f, 2.0000000000f };

        static void Main(string[] args)
        {
            var distinct = values.Distinct(new KDFloatComparer(0.001f)).OrderBy(d => d).ToArray();

            Console.WriteLine("Valores distintos: ");

            foreach (var f in distinct)
                Console.WriteLine(f);

            Console.ReadKey();
        }

        public class KDFloatComparer : EqualityComparer<float>
        {
            public readonly float InternalEpsilon = 0.001f;

            public KDFloatComparer(float epsilon)
                : base()
            {
                InternalEpsilon = epsilon;
            }

            // http://stackoverflow.com/a/31587700/393406
            public override bool Equals(float a, float b)
            {
                float absoluteA = Math.Abs(a);
                float absoluteB = Math.Abs(b);
                float absoluteDifference = Math.Abs(a - b);

                if (a == b)
                {
                    return true;
                }
                else if (a == 0 || b == 0 || absoluteDifference < InternalEpsilon)
                {
                    // a or b is zero or both are extremely close to it.
                    // Relative error is less meaningful here.
                    return absoluteDifference < InternalEpsilon;
                }
                else
                {
                    // Use relative error.
                    return absoluteDifference / (absoluteA + absoluteB) < InternalEpsilon;
                }

                return true;
            }

            public override int GetHashCode(float value)
            {
                return value.GetHashCode();
            }
        }

        public class FComparer : IEqualityComparer<float>
        {

            public bool Equals(float x, float y)
            {

                var dif = Math.Abs(x - y);

                if ((x == 0 || y == 0) && dif < float.Epsilon)
                    return true;

                if (Math.Sign(x) != Math.Sign(y))
                    return false;

                return dif < float.Epsilon;
            }

            public int GetHashCode(float obj)
            {
                return obj.GetHashCode();
            }
        }

    }
}

Linux / Mono V4.0.1下的结果:

  

Valores distintos:   -11,00004   -11   -10   -9,999998   -8   -7   -5,000001 2 3 6 7 9,999955 10 11

所以我唯一可以想到的是你的单声道版本有浮动数学错误,确实有一些旧版本确实存在一些问题。

尝试将您的单声道版本更新为最新版本,甚至更好,从您机器上的最新源代码编译。

另外,我已经包含了一个较小的比较器,可以得到相同的结果。

编辑:我也更正了你的比较器,在你使用InternalEpsilon的一个地方和其他float.Epsilon中,float.Epsilon是1,401298E-45,这在你的字符串中是无法表示的因为他们只有九个小数位,如果有一个差异小于0.000000001你没有看到它被裁剪。

EDIT :只有当哈希码相同时,似乎Distinct才会执行比较器的等号,因此每个float都有不同的哈希码Equals永远不会被执行。

这个例子100%生成随机数。

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Threading.Tasks;

namespace TestEqual
{
    class Program
    {
        static void Main(string[] args)
        {

            Random rnd = new Random();

            List<float> numbers = new List<float>();

            for(int buc = 0; buc < 1000; buc++)
                numbers.Add((float)rnd.NextDouble());

            var distinct = numbers.OrderBy(d => d).Distinct(new FComparer()).OrderBy(d => d).ToArray();

            Console.WriteLine(float.Epsilon);

            Console.WriteLine("Valores distintos: ");

            foreach (var f in distinct)
                Console.WriteLine(f);

            foreach (var f in distinct)
            {

                for (int buc = 0; buc < distinct.Length; buc++)
                    if (Math.Abs(f - distinct[buc]) < 0.001f && f != distinct[buc])
                        Console.WriteLine("Duplicate");

            }

            Console.ReadKey();
        }

        public class FComparer : IEqualityComparer<float>
        {

            public bool Equals(float x, float y)
            {

                var dif = Math.Abs(x - y);

                if ((x == 0 || y == 0) && dif < 0.001f)
                    return true;

                if (Math.Sign(x) != Math.Sign(y))
                    return false;

                return dif < 0.001f;
            }

            public int GetHashCode(float obj)
            {
                //This is the key, if GetHashCode is different then Equals is not called
                return 0;
            }
        }

    }
}