如何用Linq表达Hellinger距离

时间:2013-05-06 22:17:37

标签: c# linq

我想用Linq表达以下公式

Hellinger distance formula

我有以下功能

private double Calc(IEnumerable<Frequency> recording, IEnumerable<Frequency> reading)
{
}

Frequency所在的位置:

public class Frequency
{
  public double Probability { get; set; } //which are p's and q's in the formula
  public int Strength { get; set; } //the i's i the formula 
}

该函数的示例调用是

public void Caller(){
   IEnumerable<Frequency> recording = new List<Frequency>
                                            {
                                               new Frequency {Strength = 32, Probability = 0.2}, //p32 = 0.2
                                               new Frequency {Strength = 33, Probability = 0.2}, //p33 = 0.2
                                               new Frequency {Strength = 34, Probability = 0.2}, //p34 = 0.2
                                               new Frequency {Strength = 35, Probability = 0.2}, //...
                                               new Frequency {Strength = 41, Probability = 0.2} //...
                                            };

   IEnumerable<Frequency> reading = new List<Frequency>
                                            {
                                               new Frequency {Strength = 34, Probability = 0.2}, //q34 = 0.2
                                               new Frequency {Strength = 35, Probability = 0.2},  //q35 = 0.2
                                               new Frequency {Strength = 36, Probability = 0.2},
                                               new Frequency {Strength = 37, Probability = 0.2},
                                               new Frequency {Strength = 80, Probability = 0.2},
                                            };
   Calc(reading, recordig);
}

例如,new Frequency {Strength = 32, Probability = 0.2},表示Hellinger公式中的p32 = 0.2

k在公式中将为100,如果集合中不存在元素,则它将具有值0.例如,记录仅具有i = 32,33,34,35,41的值对于1-100 pi中的其他值将为零。

我的第一个实现是

  private double Calc(IEnumerable<Frequency> recording, IEnumerable<Frequency> reading)
  {
     double result = 0;

     foreach (var i in Enumerable.Range(1,100))
     {
        var recStr = recording.FirstOrDefault(a => a.Strength == i);
        var readStr = reading.FirstOrDefault(a => a.Strength == i);
        var recVal = recStr == null ? 0 : recStr.Probability;
        var readVal = readStr == null ? 0 : readStr.Probability;

        result += Math.Pow(Math.Sqrt(recVal) - Math.Sqrt(readVal), 2);
     }

     result = Math.Sqrt(result/2);
     return result;
  }

既不高效又不优雅。我觉得解决方案可以改进,但我想不出更好的方法。

2 个答案:

答案 0 :(得分:1)

Resharper将您的功能转变为:

double result = (from i in Enumerable.Range(1, 100) 
                 let recStr = recording.FirstOrDefault(a => a.Strength == i) 
                 let readStr = reading.FirstOrDefault(a => a.Strength == i) 
                 let recVal = recStr == null ? 0 : recStr.Probability 
                 let readVal = readStr == null ? 0 : readStr.Probability 
                 select Math.Pow(Math.Sqrt(recVal) - Math.Sqrt(readVal), 2)).Sum();


return Math.Sqrt(result / 2);

正如Patashu所说,你可以使用Dictionary<int, Frequency>获得O(1)查询时间:

private double Calc(Dictionary<int, Frequency> recording, Dictionary<int, Frequency> reading)
{
    double result = (from i in Enumerable.Range(1, 100) 
                     let recVal = recording.ContainsKey(i) ? 0 : recording[i].Probability 
                     let readVal = reading.ContainsKey(i) ? 0 : reading[i].Probability 
                     select Math.Pow(Math.Sqrt(recVal) - Math.Sqrt(readVal), 2)).Sum();

    return Math.Sqrt(result / 2);
}

答案 1 :(得分:1)

这个问题很复杂,因为列表很稀疏(我们没有所有读数的概率)。所以,首先我们解决这个问题:

public static IEnumerable<Frequency> FillHoles(this IEnumerable<Frequency> src, int start, int end) {
    IEnumerable<int> range = Enumerable.Range(start, end-start+1);
    var result = from num in range
                 join _freq in src on num equals _freq.Strength into g
                 from freq in g.DefaultIfEmpty(new Frequency { Strength = num, Probability = 0 })
                 select freq;
    return result;
}

这给我们留下了密集的频率读数。现在我们只需要应用公式:

// Make the arrays dense
recording = recording.FillHoles(1, 100);
reading = reading.FillHoles(1, 100);
// This is the thing we will be summing
IEnumerable<double> series = from rec in recording
                            join read in reading on rec.Strength equals read.Strength
                            select Math.Pow(Math.Sqrt(rec.Probability)-Math.Sqrt(read.Probability), 2);

double result = 1 / Math.Sqrt(2) * Math.Sqrt(series.Sum());
result.Dump();

不确定这是否比你拥有的更高效。