找到最长的重叠期

时间:2018-01-26 20:58:59

标签: c# algorithm datetime intervals

我有一个包含Id,DateFrom,DateTo的记录列表。为了这个问题,我们可以使用这个:

    List<(int, DateTime, DateTime)> data = new List<(int, DateTime, DateTime)>
        {
            (1, new DateTime(2012, 5, 16), new DateTime(2018, 1, 25)),
            (2, new DateTime(2009, 1, 1), new DateTime(2011, 4, 27)),
            (3, new DateTime(2014, 1, 1), new DateTime(2016, 4, 27)),
            (4, new DateTime(2015, 1, 1), new DateTime(2015, 1, 3)),
            (2, new DateTime(2013, 5, 10), new DateTime(2017, 4, 27)),
            (5, new DateTime(2013, 5, 16), new DateTime(2018, 1, 24)),
            (2, new DateTime(2017, 4, 28), new DateTime(2018, 1, 24)),
        };

在我的实际案例中,List可能会更大。最初我正在假设某个Id只能有一个记录,我能够提出一个非常好的解决方案但是现在,正如你所看到的,假设你可以有几个在比较整个时间时,应考虑Id和所有时期的句点。

任务是找到时间重叠最长的两条记录,并返回id和重叠的天数。

在此示例中,这意味着这些应该是记录1和2.

我的实现如下:

    public (int, int, int) GetLongestElapsedPeriodWithDuplications(List<(int, DateTime, DateTime)> periods)
    {
        Dictionary<int, List<(DateTime, DateTime)>> periodsByPeriodId = new Dictionary<int, List<(DateTime, DateTime)>>();

        foreach (var period in periods)
        {
            if (periodsByPeriodId.ContainsKey(period.Item1))
            {
                periodsByPeriodId[period.Item1].Add((period.Item2, period.Item3));
            }
            else
            {
                periodsByPeriodId[period.Item1] = new List<(DateTime, DateTime)>();
                periodsByPeriodId[period.Item1].Add((period.Item2, period.Item3));
            }
        }

        int firstId = -1;
        int secondId = -1;
        int periodInDays = 0;

        foreach (var period in periodsByPeriodId)
        {
            var Id = period.Key;

            foreach (var currPeriod in periodsByPeriodId)
            {
                int currentPeriodInDays = 0;
                if (Id != currPeriod.Key)
                {
                    for (var i = 0; i < period.Value.Count; i++)
                    {
                        for (var j = 0; j < currPeriod.Value.Count; j++)
                        {
                            var firstPeriodDateFrom = period.Value[i].Item1;
                            var firstPeriodDateTo = period.Value[i].Item2;

                            var secondPeriodDateFrom = currPeriod.Value[j].Item1;
                            var secondPeriodDateTo = currPeriod.Value[j].Item2;

                            if (secondPeriodDateFrom < firstPeriodDateTo && secondPeriodDateTo > firstPeriodDateFrom)
                            {
                                DateTime commonStartingDate = secondPeriodDateFrom > firstPeriodDateFrom ? secondPeriodDateFrom : firstPeriodDateFrom;
                                DateTime commonEndDate = secondPeriodDateTo > firstPeriodDateTo ? firstPeriodDateTo : secondPeriodDateTo;

                                currentPeriodInDays += (int)(commonEndDate - commonStartingDate).TotalDays;
                            }
                        }
                    }
                    if (currentPeriodInDays > periodInDays)
                    {
                        periodInDays = currentPeriodInDays;
                        firstId = Id;
                        secondId = currPeriod.Key;
                    }
                }
            }
        }
        return (firstId, secondId, periodInDays);
    }

正如您所看到的那样,该方法非常大,而且我认为远远不能在执行速度方面进行优化。我知道那些嵌套循环会增加复杂性,但是对Id处理多个句点的额外要求实际上让我没有想法。如何优化这种逻辑,以便在输入更大的情况下,它的执行速度比现在更快?

5 个答案:

答案 0 :(得分:2)

与原始解决方案一样 - 您需要将每个时间间隔与具有相同ID的时间间隔进行比较,因此我将这样编码为:

支持类,只是为了简化实际算法:

class Period {
    public DateTime Start { get; }
    public DateTime End { get; }

    public Period(DateTime start, DateTime end) {
        this.Start = start;
        this.End = end;
    }

    public int Overlap(Period other) {
        DateTime a = this.Start > other.Start ? this.Start : other.Start;
        DateTime b = this.End < other.End ? this.End : other.End;
        return (a < b) ? b.Subtract(a).Days : 0;
    }
}

class IdData {
    public IdData() {
        this.Periods = new List<Period>();
        this.Overlaps = new Dictionary<int, int>();
    }
    public List<Period> Periods { get; }
    public Dictionary<int, int> Overlaps { get; }
}

查找最大重叠的方法:

    static int GetLongestElapsedPeriod(List<(int, DateTime, DateTime)> periods) {
        int maxOverlap = 0;

        Dictionary<int, IdData> ids = new Dictionary<int, IdData>();
        foreach (var period in periods) {
            int id = period.Item1;
            Period idPeriod = new Period(period.Item2, period.Item3);

            // preserve interval for ID
            var idData = ids.GetValueOrDefault(id, new IdData());
            idData.Periods.Add(idPeriod);
            ids[id] = idData;

            foreach (var idObj in ids) {
                if (idObj.Key != id) {
                    // here we calculate of new interval with all previously met
                    int o = idObj.Value.Overlaps.GetValueOrDefault(id, 0);
                    foreach (var otherPeriods in idObj.Value.Periods)
                        o += idPeriod.Overlap(otherPeriods);
                    idObj.Value.Overlaps[id] = o;

                    // check whether newly calculate overlapping is the maximal one, preserve Ids if needed too
                    if (o > maxOverlap)
                        maxOverlap = o;
                }
            }
        }

        return maxOverlap;
    }

答案 1 :(得分:1)

使用扩展方法:

public static T MaxBy<T, TKey>(this IEnumerable<T> src, Func<T, TKey> key, Comparer<TKey> keyComparer = null) {
    keyComparer = keyComparer ?? Comparer<TKey>.Default;
    return src.Aggregate((a, b) => keyComparer.Compare(key(a), key(b)) > 0 ? a : b);
}

还有一些辅助函数

DateTime Max(DateTime a, DateTime b) => (a > b) ? a : b;
DateTime Min(DateTime a, DateTime b) => (a < b) ? a : b;

int OverlappingDays((DateTime DateFrom, DateTime DateTo) span1, (DateTime DateFrom, DateTime DateTo) span2) {
    var maxFrom = Max(span1.DateFrom, span2.DateFrom);
    var minTo = Min(span1.DateTo, span2.DateTo);
    return Math.Max((minTo - maxFrom).Days, 0);
}

您可以将匹配Id s

组合在一起
var dg = data.GroupBy(d => d.Id);

生成所有Id s

var pdgs = from d1 in dg
           from d2 in dg.Where(d => d.Key > d1.Key)
           select new[] { d1, d2 };

然后计算每对Id之间的重叠天数,找到最大值:

var MaxOverlappingPair = pdgs.Select(pdg => new {
    Id1 = pdg[0].Key,
    Id2 = pdg[1].Key,
    OverlapInDays = pdg[0].SelectMany(d1 => pdg[1].Select(d2 => OverlappingDays((d1.DateFrom, d1.DateTo), (d2.DateFrom, d2.DateTo)))).Sum()
}).MaxBy(TwoOverlap => TwoOverlap.OverlapInDays);

由于提到效率,我应该说直接实现其中一些操作而不是使用LINQ更有效,但是你使用的是元组和内存结构,所以我认为它不会产生太大的影响。

我使用包含1249个唯一ID的24000个跨度列表运行了一些性能测试。 LINQ代码大约需要16秒。通过内联一些LINQ并用元组替换匿名对象,它降低到大约3.1秒。通过添加快捷方式,跳过累积天数比当前最大重叠天数短的任何ID以及更多优化,我将其降低到不到1秒。

var baseDate = new DateTime(1970, 1, 1);

int OverlappingDays(int DaysFrom1, int DaysTo1, int DaysFrom2, int DaysTo2) {
    var maxFrom = DaysFrom1 > DaysFrom2 ? DaysFrom1 : DaysFrom2;
    var minTo = DaysTo1 < DaysTo2 ? DaysTo1 : DaysTo2;
    return (minTo > maxFrom) ? minTo - maxFrom : 0;
}

var dgs = data.Select(d => {
    var DaysFrom = (d.DateFrom - baseDate).Days;
    var DaysTo = (d.DateTo - baseDate).Days;
    return (d.Id, DaysFrom, DaysTo, Dist: DaysTo - DaysFrom);
})
              .GroupBy(d => d.Id)
              .Select(dg => (Id: dg.Key, Group: dg, Dist: dg.Sum(d => d.Dist)))
              .ToList();

var MaxOverlappingPair = (Id1: 0, Id2: 0, OverlapInDays: 0);

for (int j1 = 0; j1 < dgs.Count; ++j1) {
    var dg1 = dgs[j1];
    if (dg1.Dist > MaxOverlappingPair.OverlapInDays)
        for (int j2 = j1 + 1; j2 < dgs.Count; ++j2) {
            var dg2 = dgs[j2];
            if (dg2.Dist > MaxOverlappingPair.OverlapInDays) {
                var testOverlapInDays = 0;
                foreach (var d1 in dg1.Group)
                    foreach (var d2 in dg2.Group)
                        testOverlapInDays += OverlappingDays(d1.DaysFrom, d1.DaysTo, d2.DaysFrom, d2.DaysTo);

                if (testOverlapInDays > MaxOverlappingPair.OverlapInDays)
                    MaxOverlappingPair = (dg1.Id, dg2.Id, testOverlapInDays);
            }
        }
}

应用优化:

  1. 将每个范围DateTime转换为arbitrary baseDate天数,以优化重叠天计算,方法是进行一次日期转换。
  2. 计算每个跨度的总天数,并跳过任何不能超过当前重叠的跨度对
  3. SelectMany / Select替换为嵌套foreach以计算重叠天数。
  4. 使用ValueTuple代替匿名对象,这些对象可以(稍微)更快地解决此问题。
  5. 将对生成LINQ替换为嵌套for循环直接生成每个可能的对
  6. 将个人从/到参数而不是对象传递给OverlappingDays函数
  7. 注意:我尝试了更智能的重叠日计算,但是当每个ID的跨度数量很小时,开销花费的时间比直接进行计算要长。

答案 2 :(得分:1)

您可以使用 TimePeriodLibrary.NET

  

PM&GT; Install-Package TimePeriodLibrary.NET

TimePeriodCollection timePeriods = new TimePeriodCollection(
    data.Select(q => new TimeRange(q.Item2, q.Item3)));

var longestOverlap = timePeriods
    .OverlapPeriods(new TimeRange(timePeriods.Start, timePeriods.End))
    .OrderByDescending(q => q.Duration)
    .FirstOrDefault();

答案 3 :(得分:0)

已经很少有解决方案

但是

如果您想提高效率,那么您不必将每个对象/值与每个其他值或对象进行比较。您可以使用Interval Search Tree解决此问题,并且可以在RlogN中解决,其中R是间隔之间的交叉点数。

我建议你观看Robert Sedgwick的video,并且该书可以在线获取。

答案 4 :(得分:-2)

这里的基本问题是如何识别一组独特的时间段。自己给每个人一个独特的身份证。

当您撰写最终答案时,请在输出中包含其他详细信息,以便用户可以了解哪些(原始)ID和原始时间段导致最终答案。

请记住 - 问题仍然与原始帖子(https://codereview.stackexchange.com/questions/186014/finding-the-longest-overlapping-period/186031?noredirect=1#comment354707_186031)中的问题相同,并且您仍然可以使用相同的信息。不要太过挂在原始列表中提供的“ID”上 - 您仍在迭代一段时间段。