我写了一个用于计算二进制文件中每个字节频率的代码。使用Linq。执行Linq表达式时,代码似乎变慢。在这种逻辑上似乎很难实现Parallelism。要构建超过475MB的频率表,大约需要1分钟。
class Program
{
static void Main(string[] args)
{
Dictionary<byte, int> freq = new Dictionary<byte, int>();
Stopwatch sw = new Stopwatch();
sw.Start();
//File Size 478.668 KB
byte[] ltext = File.ReadAllBytes(@"D:\Setup.exe");
sw.Stop();
Console.WriteLine("Reading File {0}", GetTime(sw));
sw.Start();
Dictionary<byte, int> result = (from i in ltext
group i by i into g
orderby g.Count() descending
select new { Key = g.Key, Freq = g.Count() })
.ToDictionary(x => x.Key, x => x.Freq);
sw.Stop();
Console.WriteLine("Generating Freq Table {0}", GetTime(sw));
foreach (var i in result)
{
Console.WriteLine(i);
}
Console.WriteLine(result.Count);
Console.ReadLine();
}
static string GetTime(Stopwatch sw)
{
TimeSpan ts = sw.Elapsed;
string elapsedTime = String.Format("{0} min {1} sec {2} ms",ts.Minutes, ts.Seconds, ts.Milliseconds);
return elapsedTime;
}
我尝试使用少量循环实现非linq解决方案,其性能大致相同。请任何优化此建议。抱歉我的英文不好
答案 0 :(得分:2)
在戴尔笔记本电脑的另一台442MB文件上花了一秒多的时间:
byte[] ltext = File.ReadAllBytes(@"c:\temp\bigfile.bin");
var freq = new long[256];
var sw = Stopwatch.StartNew();
foreach (byte b in ltext) {
freq[b]++;
}
sw.Stop();
Console.WriteLine(sw.ElapsedMilliseconds);
非常难以击败阵列的原始性能。
答案 1 :(得分:2)
以下显示在发布模式下构建时,在9秒内,我机器上465MB文件中的降序字节频率。
注意,我已经通过读取100000字节块中的文件来加快速度(你可以试验一下 - 16K块在我的机器上没有明显区别)。关键是内循环是提供字节的循环。调用Stream.ReadByte()速度快但不如索引数组中的字节快。
此外,将整个文件读入内存会产生极大的内存压力,这会阻碍性能,如果文件足够大,则会完全失败。
using System;
using System.Diagnostics;
using System.IO;
using System.Linq;
class Program
{
static void Main( string[] args )
{
Console.WriteLine( "Reading file..." );
var sw = Stopwatch.StartNew();
var frequency = new long[ 256 ];
using ( var input = File.OpenRead( @"c:\Temp\TestFile.dat" ) )
{
var buffer = new byte[ 100000 ];
int bytesRead;
do
{
bytesRead = input.Read( buffer, 0, buffer.Length );
for ( var i = 0; i < bytesRead; i++ )
frequency[ buffer[ i ] ]++;
} while ( bytesRead == buffer.Length );
}
Console.WriteLine( "Read file in " + sw.ElapsedMilliseconds + "ms" );
var result = frequency.Select( ( f, i ) => new ByteFrequency { Byte = i, Frequency = f } )
.OrderByDescending( x => x.Frequency );
foreach ( var byteCount in result )
Console.WriteLine( byteCount.Byte + " " + byteCount.Frequency );
}
public class ByteFrequency
{
public int Byte { get; set; }
public long Frequency { get; set; }
}
}
答案 2 :(得分:1)
为什么不
int[] freq = new int[256];
foreach (byte b in ltext)
freq[b]++;