我正在尝试计算高斯光束的傅里叶变换。稍后我想对以下示例代码进行一些修改。使用1e-6所需的步长,使用8内核的计算在我的工作站上需要1244秒。消耗最大的部分显然是u孔的产生。有没有人提出改善表现的想法?为什么mathematica不会从我的表达式中创建一个打包列表,当我在其中同时包含实数和复数值时?
uin[gx_, gy_, z_] := Module[{w0 = L1[[1]], z0 = L1[[3]], w, R, \[Zeta], k},
w = w0 Sqrt[1 + (z/z0)^2];
R = z (1 + (z0/z)^2);
\[Zeta] = ArcTan[z/z0];
k = 2*Pi/193*^-9;
Developer`ToPackedArray[
ParallelTable[
w0/w Exp[-(x^2 + y^2)/w^2] Exp[-I k/2/R (x^2 + y^2)/2] Exp[-I k z*0 +
I \[Zeta]*0], {x, gx}, {y, gy}]
]
]
AbsoluteTiming[
dx = 1*^-6;
gx = Range[-8*^-3, 8*^-3, dx];
gy = gx;
d = 15*^-3;
uaperture = uin[gx, gy, d];
ufft = dx*dx* Fourier[uaperture];
uout = RotateRight[
Abs[ufft]*dx^2, {Floor[Length[gx]/2], Floor[Length[gx]/2]}];
]
提前致谢,
约翰
答案 0 :(得分:2)
您可以通过先对其进行矢量化(uin2
),然后对其进行编译(uin3
)来加快速度:
In[1]:= L1 = {0.1, 0.2, 0.3};
In[2]:= uin[gx_, gy_, z_] :=
Module[{w0 = L1[[1]], z0 = L1[[3]], w, R, \[Zeta], k},
w = w0 Sqrt[1 + (z/z0)^2];
R = z (1 + (z0/z)^2);
\[Zeta] = ArcTan[z/z0];
k = 2*Pi/193*^-9;
ParallelTable[
w0/w Exp[-(x^2 + y^2)/
w^2] Exp[-I k/2/R (x^2 + y^2)/2] Exp[-I k z*0 +
I \[Zeta]*0], {x, gx}, {y, gy}]
]
In[3]:= uin2[gx_, gy_, z_] :=
Module[{w0 = L1[[1]], z0 = L1[[3]], w, R, \[Zeta], k, x, y},
w = w0 Sqrt[1 + (z/z0)^2];
R = z (1 + (z0/z)^2);
\[Zeta] = ArcTan[z/z0];
k = 2*Pi/193*^-9;
{x, y} = Transpose[Outer[List, gx, gy], {3, 2, 1}];
w0/w Exp[-(x^2 + y^2)/
w^2] Exp[-I k/2/R (x^2 + y^2)/2] Exp[-I k z*0 + I \[Zeta]*0]
]
In[4]:= uin3 =
Compile[{{gx, _Real, 1}, {gy, _Real, 1}, z},
Module[{w0 = L1[[1]], z0 = L1[[3]], w, R, \[Zeta], k, x, y},
w = w0 Sqrt[1 + (z/z0)^2];
R = z (1 + (z0/z)^2);
\[Zeta] = ArcTan[z/z0];
k = 2*Pi/193*^-9;
{x, y} = Transpose[Outer[List, gx, gy], {3, 2, 1}];
w0/w Exp[-(x^2 + y^2)/
w^2] Exp[-I k/2/R (x^2 + y^2)/2] Exp[-I k z*0 + I \[Zeta]*0]
],
CompilationOptions -> {"InlineExternalDefinitions" -> True}
];
In[5]:= dx = 1*^-5;
gx = Range[-8*^-3, 8*^-3, dx];
gy = gx;
d = 15*^-3;
In[9]:= r1 = uin[gx, gy, d]; // AbsoluteTiming
Out[9]= {67.9448862, Null}
In[10]:= r2 = uin2[gx, gy, d]; // AbsoluteTiming
Out[10]= {28.3326206, Null}
In[11]:= r3 = uin3[gx, gy, d]; // AbsoluteTiming
Out[11]= {0.4190239, Null}
即使没有并行运行,我们的加速也提高了约160倍。
结果是一样的:
In[12]:= r1 == r2
Out[12]= True
由于数字错误,这里存在细微的差别:
In[13]:= r2 == r3
Out[13]= False
In[14]:= Max@Abs[r2 - r3]
Out[14]= 5.63627*10^-14