计数以查找子集频率

时间:2016-09-22 01:15:05

标签: c# algorithm sorting

我试图找到经常出现在N个集合中的数字组合(大小相同,一组内没有重复)。

例如:

{3, 5, 2, 4, 6, 11}
{3, 7, 2, 11, 5, 14}
{8, 2, 1, 11, 14, 6}
{9, 1, 12, 8, 17, 4}
{4, 10, 16, 5, 14, 3}

我使用数字计数算法来查找各组中各个数字的出现次数。

public static int[] Counting (int []A, int m )
{
    int n = A.Length;
    int[] count = new int[m+1];
    Array.Clear(count, 0, m+1);
    for (int k = 0; k < n; k++)
        count[A[k]] += 1;
    return count;
}

是否存在使用子集执行相同操作的算法。在上面的示例中,{2,11},{3,2,11},{11,14}更频繁地出现在一起。输出应该具有子集的计数,即上面的示例 {2,11} 频率 3

1 个答案:

答案 0 :(得分:2)

这对你有用吗?

Func<IEnumerable<int>, IEnumerable<IEnumerable<int>>> getAllSubsets = null;
getAllSubsets = xs =>
    (xs == null || !xs.Any())
        ? Enumerable.Empty<IEnumerable<int>>()
        :  xs.Skip(1).Any()
            ? getAllSubsets(xs.Skip(1))
                .SelectMany(ys => new [] { ys, xs.Take(1).Concat(ys) })
            : new [] { Enumerable.Empty<int>(), xs.Take(1) };

var source = new int[][]
{
    new [] {3, 5, 2, 4, 6, 11},
    new [] {3, 7, 2, 11, 5, 14},
    new [] {8, 2, 1, 11, 14, 6},
    new [] {9, 1, 12, 8, 17, 4},
    new [] {4, 10, 16, 5, 14, 3},
};

var subsets = source.Select(x => getAllSubsets(x).Select(y => new { key = String.Join(",", y), values = y.ToArray() }).ToArray()).ToArray();

var keys = subsets.SelectMany(x => x.Select(y => y.key)).Distinct().ToArray();

var query =
    from key in keys
    let count = subsets.Where(x => x.Select(y => y.key).Contains(key)).Count()
    where count > 1
    orderby count descending
    select new { key, count, };

我得到了这个结果:

result

5的第一个结果是空集,每个集都包含。