如果将用C#编写的方法传递为null或介于0到6,000,000之间的随机生成且未排序的整数,那么最有效的方法是确定所有模式以及它们发生了多少次?尤其是,有人可以在我一直在努力的基于LINQ的解决方案中为我提供帮助吗?
这是我到目前为止所拥有的:
到目前为止,我最接近的LINQ解决方案仅获取其找到的第一个模式,并且未指定出现的次数。在我的计算机上,它的速度也是丑陋,笨拙的实现的7倍左右。
int mode = numbers.GroupBy(number => number).OrderByDescending(group => group.Count()).Select(k => k.Key).FirstOrDefault();
我的手动编码方法。
public class NumberCount
{
public int Value;
public int Occurrences;
public NumberCount(int value, int occurrences)
{
Value = value;
Occurrences = occurrences;
}
}
private static List<NumberCount> findMostCommon(List<int> integers)
{
if (integers == null)
return null;
else if (integers.Count < 1)
return new List<NumberCount>();
List<NumberCount> mostCommon = new List<NumberCount>();
integers.Sort();
mostCommon.Add(new NumberCount(integers[0], 1));
for (int i=1; i<integers.Count; i++)
{
if (mostCommon[mostCommon.Count - 1].Value != integers[i])
mostCommon.Add(new NumberCount(integers[i], 1));
else
mostCommon[mostCommon.Count - 1].Occurrences++;
}
List<NumberCount> answer = new List<NumberCount>();
answer.Add(mostCommon[0]);
for (int i=1; i<mostCommon.Count; i++)
{
if (mostCommon[i].Occurrences > answer[0].Occurrences)
{
if (answer.Count == 1)
{
answer[0] = mostCommon[i];
}
else
{
answer = new List<NumberCount>();
answer.Add(mostCommon[i]);
}
}
else if (mostCommon[i].Occurrences == answer[0].Occurrences)
{
answer.Add(mostCommon[i]);
}
}
return answer;
}
基本上,我试图获得一种优雅,紧凑的LINQ解决方案,其速度至少与我的丑陋方法一样快。预先感谢您的任何建议。
答案 0 :(得分:0)
我个人会使用ConcurrentDictionary
来更新计数器,并且可以更快地访问字典。我经常使用这种方法,而且更具可读性。
// create a dictionary
var dictionary = new ConcurrentDictionary<int, int>();
// list of you integers
var numbers = new List<int>();
// parallel the iteration ( we can because concurrent dictionary is thread safe-ish
numbers.AsParallel().ForAll((number) =>
{
// add the key if it's not there with value of 1 and if it's there it use the lambda function to increment by 1
dictionary.AddOrUpdate(number, 1, (key, old) => old + 1);
});
那么,要获得最多的发生只是很多方法。我不完全了解您的版本,但是最多只能是1个汇总的问题,就像这样:
var topMostOccurence = dictionary.Aggregate((x, y) => { return x.Value > y.Value ? x : y; });
答案 1 :(得分:0)
我在Intel i7-8700K上使用以下代码进行了测试,并获得了以下结果:
Lambda:在134毫秒内找到78。
手动:在368毫秒内找到78。
字典:在195毫秒内找到78个。
static IEnumerable<int> GenerateNumbers(int amount)
{
Random r = new Random();
for (int i = 0; i < amount; i++)
yield return r.Next(100);
}
static void Main(string[] args)
{
var numbers = GenerateNumbers(6_000_000).ToList();
Stopwatch sw = Stopwatch.StartNew();
int mode = numbers.GroupBy(number => number).OrderByDescending(group => group.Count()).Select(k =>
{
int count = k.Count();
return new { Mode = k.Key, Count = count };
}).FirstOrDefault().Mode;
sw.Stop();
Console.WriteLine($"Lambda: found {mode} in {sw.ElapsedMilliseconds} ms.");
sw = Stopwatch.StartNew();
mode = findMostCommon(numbers)[0].Value;
sw.Stop();
Console.WriteLine($"Manual: found {mode} in {sw.ElapsedMilliseconds} ms.");
// create a dictionary
var dictionary = new ConcurrentDictionary<int, int>();
sw = Stopwatch.StartNew();
// parallel the iteration ( we can because concurrent dictionary is thread safe-ish
numbers.AsParallel().ForAll((number) =>
{
// add the key if it's not there with value of 1 and if it's there it use the lambda function to increment by 1
dictionary.AddOrUpdate(number, 1, (key, old) => old + 1);
});
mode = dictionary.Aggregate((x, y) => { return x.Value > y.Value ? x : y; }).Key;
sw.Stop();
Console.WriteLine($"Dictionary: found {mode} in {sw.ElapsedMilliseconds} ms.");
Console.ReadLine();
}
答案 2 :(得分:0)
您想要的是:2个以上的数字可能同时出现在数组中,例如:{1,1,1,2,2,2,3,3,3}
您当前的代码来自这里:Find the most occurring number in a List<int> 但是它只返回一个数字,这完全是错误的结果。
Linq的问题是:如果您不希望循环继续下去,循环将无法结束。
但是,在这里,我根据需要生成了LINQ的列表:
List<NumberCount> MaxOccurrences(List<int> integers)
{
return integers?.AsParallel()
.GroupBy(x => x)//group numbers, key is number, count is count
.Select(k => new NumberCount(k.Key, k.Count()))
.GroupBy(x => x.Occurrences)//group by Occurrences, key is Occurrences, value is result
.OrderByDescending(x => x.Key) //sort
.FirstOrDefault()? //the first one is result
.ToList();
}
测试详细信息:
数组大小:30000
30000
MaxOccurrences only
MaxOccurrences1: 207
MaxOccurrences2: 38
=============
Full List
Original1: 28
Original2: 23
ConcurrentDictionary1: 32
ConcurrentDictionary2: 34
AsParallel1: 27
AsParallel2: 19
AsParallel3: 36
ArraySize:3000000
3000000
MaxOccurrences only
MaxOccurrences1: 3009
MaxOccurrences2: 1962 //<==this is the best one in big loop.
=============
Full List
Original1: 3200
Original2: 3234
ConcurrentDictionary1: 3391
ConcurrentDictionary2: 2681
AsParallel1: 3776
AsParallel2: 2389
AsParallel3: 2155
这是代码:
class Program
{
static void Main(string[] args)
{
const int listSize = 3000000;
var rnd = new Random();
var randomList = Enumerable.Range(1, listSize).OrderBy(e => rnd.Next()).ToList();
// the code that you want to measure comes here
Console.WriteLine(randomList.Count);
Console.WriteLine("MaxOccurrences only");
Test(randomList, MaxOccurrences1);
Test(randomList, MaxOccurrences2);
Console.WriteLine("=============");
Console.WriteLine("Full List");
Test(randomList, Original1);
Test(randomList, Original2);
Test(randomList, AsParallel1);
Test(randomList, AsParallel2);
Test(randomList, AsParallel3);
Console.ReadLine();
}
private static void Test(List<int> data, Action<List<int>> method)
{
var watch = System.Diagnostics.Stopwatch.StartNew();
method(data);
watch.Stop();
Console.WriteLine($"{method.Method.Name}: {watch.ElapsedMilliseconds}");
}
private static void Original1(List<int> integers)
{
integers?.GroupBy(number => number)
.OrderByDescending(group => group.Count())
.Select(k => new NumberCount(k.Key, k.Count()))
.ToList();
}
private static void Original2(List<int> integers)
{
integers?.GroupBy(number => number)
.Select(k => new NumberCount(k.Key, k.Count()))
.OrderByDescending(x => x.Occurrences)
.ToList();
}
private static void AsParallel1(List<int> integers)
{
integers?.GroupBy(number => number)
.AsParallel() //each group will be count by a CPU unit
.Select(k => new NumberCount(k.Key, k.Count())) //Grap result, before sort
.OrderByDescending(x => x.Occurrences) //sort after result
.ToList();
}
private static void AsParallel2(List<int> integers)
{
integers?.AsParallel()
.GroupBy(number => number)
.Select(k => new
{
Key = k.Key,
Occurrences = k.Count()
}) //Grap result, before sort
.OrderByDescending(x => x.Occurrences) //sort after result
.ToList();
}
private static void AsParallel3(List<int> integers)
{
integers?.AsParallel()
.GroupBy(number => number)
.Select(k => new NumberCount(k.Key, k.Count())) //Grap result, before sort
.OrderByDescending(x => x.Occurrences) //sort after result
.ToList();
}
private static void MaxOccurrences1(List<int> integers)
{
integers?.AsParallel()
.GroupBy(number => number)
.GroupBy(x => x.Count())
.OrderByDescending(x => x.Key)
.FirstOrDefault()?
.ToList()
.Select(k => new NumberCount(k.Key, k.Count()))
.ToList();
}
private static void MaxOccurrences2(List<int> integers)
{
integers?.AsParallel()
.GroupBy(x => x)//group numbers, key is number, count is count
.Select(k => new NumberCount(k.Key, k.Count()))
.GroupBy(x => x.Occurrences)//group by Occurrences, key is Occurrences, value is result
.OrderByDescending(x => x.Key) //sort
.FirstOrDefault()? //the first one is result
.ToList();
}
private static void ConcurrentDictionary1(List<int> integers)
{
ConcurrentDictionary<int, int> result = new ConcurrentDictionary<int, int>();
integers?.ForEach(x => { result.AddOrUpdate(x, 1, (key, old) => old + 1); });
result.OrderByDescending(x => x.Value).ToList();
}
private static void ConcurrentDictionary2(List<int> integers)
{
ConcurrentDictionary<int, int> result = new ConcurrentDictionary<int, int>();
integers?.AsParallel().ForAll(x => { result.AddOrUpdate(x, 1, (key, old) => old + 1); });
result.OrderByDescending(x => x.Value).ToList();
}
}
public class NumberCount
{
public int Value;
public int Occurrences;
public NumberCount(int value, int occurrences)
{
Value = value;
Occurrences = occurrences;
}
}
答案 3 :(得分:0)
对于不同的长度,不同的代码效率更高,但是随着长度接近600万,这种方法似乎是最快的。通常,LINQ并不是用于提高代码速度,而是用于理解和可维护性,具体取决于您对函数式编程风格的看法。
您的代码相当快,并且使用GroupBy
击败了简单的LINQ方法。通过使用List.Sort
已高度优化这一事实,我的代码也获得了很好的好处,而我的代码也使用了该事实,但是在列表的本地副本上避免了更改源。我的代码与您的代码类似,但是围绕一次遍历进行设计,可以完成所有所需的计算。它使用我针对此问题重新优化的扩展方法,称为GroupByRuns
,它返回一个IEnumerable<IGrouping<T,T>>
。它也可以手动扩展,而不是退回到通用GroupByRuns
上,后者为键和结果选择添加了额外的参数。由于.Net没有最终用户可访问的IGrouping<,>
实现(!),因此,我推出了自己的实现ICollection
的软件来优化Count()
。
此代码的运行速度是您的代码的1.3倍(在我对您的代码进行了5%的优化后)。
首先,使用RunGrouping
类返回一组运行:
public class RunGrouping<T> : IGrouping<T, T>, ICollection<T> {
public T Key { get; }
int Count;
int ICollection<T>.Count => Count;
public bool IsReadOnly => true;
public RunGrouping(T key, int count) {
Key = key;
Count = count;
}
public IEnumerator<T> GetEnumerator() {
for (int j1 = 0; j1 < Count; ++j1)
yield return Key;
}
IEnumerator IEnumerable.GetEnumerator() => GetEnumerator();
public void Add(T item) => throw new NotImplementedException();
public void Clear() => throw new NotImplementedException();
public bool Contains(T item) => Count > 0 && EqualityComparer<T>.Default.Equals(Key, item);
public void CopyTo(T[] array, int arrayIndex) => throw new NotImplementedException();
public bool Remove(T item) => throw new NotImplementedException();
}
第二,IEnumerable
上的扩展方法将运行分组:
public static class IEnumerableExt {
public static IEnumerable<IGrouping<T, T>> GroupByRuns<T>(this IEnumerable<T> src) {
var cmp = EqualityComparer<T>.Default;
bool notAtEnd = true;
using (var e = src.GetEnumerator()) {
bool moveNext() {
notAtEnd = e.MoveNext();
return notAtEnd;
}
IGrouping<T, T> NextRun() {
var prev = e.Current;
var ct = 0;
while (cmp.Equals(e.Current, prev)) {
++ct;
moveNext();
}
return new RunGrouping<T>(prev, ct);
}
moveNext();
while (notAtEnd)
yield return NextRun();
}
}
}
最后,找到最大计数模式的扩展方法。基本上,它会遍历所有运行并保留当前运行时间最长的int
记录。
public static class IEnumerableIntExt {
public static IEnumerable<KeyValuePair<int, int>> MostCommon(this IEnumerable<int> src) {
var mysrc = new List<int>(src);
mysrc.Sort();
var maxc = 0;
var maxmodes = new List<int>();
foreach (var g in mysrc.GroupByRuns()) {
var gc = g.Count();
if (gc > maxc) {
maxmodes.Clear();
maxmodes.Add(g.Key);
maxc = gc;
}
else if (gc == maxc)
maxmodes.Add(g.Key);
}
return maxmodes.Select(m => new KeyValuePair<int, int>(m, maxc));
}
}
给出现有的整数rl
随机列表,您可以使用以下方法获得答案:
var ans = rl.MostCommon();
答案 4 :(得分:-1)
到目前为止,Netmage的速度是我发现的最快的速度。我唯一能胜过它的东西(至少在1到500,000,000的有效范围内)只能在我的计算机上使用范围从1到500,000,000或更小的值的数组,因为我只有8 GB的RAM 。这使我无法在完整的1到int.MaxValue范围内进行测试,并且我怀疑它在该大小的速度方面会落后于它,因为在更大的范围内它似乎越来越困难。它使用这些值作为索引,并使用这些索引处的值作为出现次数。使用600万随机生成的16位正整数,它与我在释放模式下的原始方法相比快约20倍。 32位整数(范围为1到500,000,000)的速度仅为它的1.6倍。
private static List<NumberCount> findMostCommon(List<int> integers)
{
List<NumberCount> answers = new List<NumberCount>();
int[] mostCommon = new int[_Max];
int max = 0;
for (int i = 1; i < integers.Count; i++)
{
int iValue = integers[i];
mostCommon[iValue]++;
int intVal = mostCommon[iValue];
if (intVal > 1)
{
if (intVal > max)
{
max++;
answers.Clear();
answers.Add(new NumberCount(iValue, max));
}
else if (intVal == max)
{
answers.Add(new NumberCount(iValue, max));
}
}
}
if (answers.Count < 1)
answers.Add(new NumberCount(0, -100)); // This -100 Occurrecnces value signifies that all values are equal.
return answers;
}
也许这样的分支是最佳的:
if (list.Count < sizeLimit)
answers = getFromSmallRangeMethod(list);
else
answers = getFromStandardMethod(list);