我提前为这个问题的长度道歉...这有点牵扯。 我正在编写一个非常简单的“加权和”操作;拍摄n个图像,将每个图像乘以特定的乘数,并将它们加到输出图像中(通过迭代每个像素)。当图像的数量不变时,我可以将逻辑硬编码为一次迭代,但是,我希望使该方法足够灵活以处理可变数量的图像。我无法想出一个同样“高效”的方法来实现这一目标,例如:当输入数量未知时,没有额外的内部循环。这是我的情况:
var rnd = new Random();
//Pixels in input and output images
const int x = 1000000;
//An output composite image
var pixelmap = new int[x];
var tStart = DateTime.Now;
//Known number of inputs
int knownNumberOfInputs = 3;
//Weights to apply to each pixel of the input images
//multipliers[0] applies to all pixels of inputA,
//multipliers[1] applies to all pixels of inputB etc.
var multipliers = new byte[3];
rnd.NextBytes(multipliers);
/* situation 1
* - I know how many input images
* - Arrays are independent */
//3 (knownNumberOfInputs) input images (we'll use random numbers for filler)
var inputA = new byte[x];
rnd.NextBytes(inputA);
var inputB = new byte[x];
rnd.NextBytes(inputB);
var inputC = new byte[x];
rnd.NextBytes(inputC);
//I can iterate through each pixel of each input image, multiply and sum for pixelmap value.
//Without a nested loop
for (var i = 0; i < x; i++)
{
pixelmap[i] = (
(inputA[i]*multipliers[0]) +
(inputB[i]*multipliers[1]) +
(inputC[i]*multipliers[2])
);
}
Console.WriteLine("Operation took " + DateTime.Now.Subtract(tStart).TotalMilliseconds + " ms");
// Operation took 39 ms
tStart = DateTime.Now;
/* situation 2
* unknown number of inputs
* inputs are contained within jagged array */
/* Caveat - multipliers.Length == inputs.Length */
//var unknownNumberOfInputs = rnd.Next(1, 10);
var unknownNumberOfInputs = 3; //Just happens to be the same number (for performance comparisons)
multipliers = new byte[unknownNumberOfInputs];
rnd.NextBytes(multipliers);
//Jagged array to contain input images
var inputs = new byte[unknownNumberOfInputs][];
//Load unknownNumberOfInputs of input images into jagged array
for (var i = 0; i < unknownNumberOfInputs; i++)
{
inputs[i] = new byte[x];
rnd.NextBytes(inputs[i]);
}
// I cannot iterate through each pixel of each input image
// Inner nested loop
for (var i = 0; i < x; i++)
{
for (var t=0;t<multipliers.Length;t++)
{
pixelmap[i] += (inputs[t][i] * multipliers[t]);
}
}
Console.WriteLine("Operation took " + DateTime.Now.Subtract(tStart).TotalMilliseconds + " ms");
//Operation took 54 ms
//How can I get rid of the inner nested loop and gain the performance of LoopA?
//Or is that the cost of not knowing?
Big ups
答案 0 :(得分:1)
我建议您创建一个表示计算的表达式。然后编译该表达式。
你的表达将是一个lambda。三个输入的示例:
void (byte[] inputA, byte[] inputB, byte[] inputC) {
for (var i = 0; i < x; i++)
{
pixelmap[i] = (
(inputA[i]*multipliers0) +
(inputB[i]*multipliers1) +
(inputC[i]*multipliers1)
);
}
}
使用.NET 4,您可以将for循环用作表达式(不在.NET 2中)。
听起来很难但实际上这很容易。
只是为了澄清:您将在运行时编译一个专门用于常量输入的函数。
你甚至可以玩弄2到4次展开循环的技巧。您还可以将乘数内联为内容,就像在示例中一样。这比嵌套循环快得多。
请注意,循环位于表达式树内,而不是它周围。这意味着您只需要一次委托调用(以及可重用的编译结果)的开销。
以下是一些示例代码,可帮助您入门:
int inputCount = ...;
var paramExpressions = GenerateArray(inputCount, i => Expression.Parameter(typeof(byte[]), "input" + i);
var summands = GenerateArray(inputCount, i => Expression.Mul(/* inputA[i] HERE */, Expression.Constant(multipliers[i]));
var sum = summands.Aggregate((a,b) => Expression.Add(a,b));
var assignment = /* assign sum to pixelmap[i] here */;
var loop = /* build a loop. ask a new question to find out how to do this, or use google */
var lambda = Expression.Lambda(paramExpressions, loop);
var delegate = lambda.Compile();
//you are done compiling. now invoke:
delegate.DynamicInvoke(arrayOfInputs); //send an object of type byte[][] into the lambda
那就是它。你需要填补空白。
答案 1 :(得分:1)
有三种方法可以提高此代码的性能:
t
和i
循环。这样,您使用相同的2个大型数组,并且可以应用第2项:以下是这一切的表现:
int t = 0;
for (; t < multipliers.Length - 2; t += 3) {
var input1 = inputs[t];
var input2 = inputs[t+1];
var input3 = inputs[t+2];
var multiplier1 = multipliers[t];
var multiplier2 = multipliers[t+1];
var multiplier3 = multipliers[t+2];
if (t == 0) {
for (var i = 0; i < x; i++)
pixelmap[i] = input1[i] * multiplier1
+ input2[i] * multiplier2
+ input3[i] * multiplier3;
} else {
for (var i = 0; i < x; i++)
pixelmap[i] += input1[i] * multiplier1
+ input2[i] * multiplier2
+ input3[i] * multiplier3;
}
}
if (multipliers.Length < 3)
Array.Clear(pixelmap, 0, pixelmap.Length);
for (; t < multipliers.Length; t++) {
var input = inputs[t];
var multiplier = multipliers[t];
for (var i = 0; i < x; i++)
pixelmap[i] += input[i] * multiplier;
}
我对结果的计时方式也有一些改变:
答案 2 :(得分:0)
您应该尝试交换内部和外部循环。
您的pixelmap可能适合Cpu缓存,然后多次写入它不会有太多伤害。
然后,您可以展开迭代像素的内部循环以获得最佳性能。确保在调试器之外测试发布版本以获得正确的时序。
如果您仍然不满意,可以一次计算一个图像扫描线。
答案 3 :(得分:0)
这是另一个答案:使用T4模板生成1到20个输入的所有可能函数作为编译时间。这不像我之前的回答那么酷,但也很好用。