Question

我的部分模拟代码要求我找到给定密度和温度的不透明度。这没有分析关系;标准方法是使用2D数组，其中opacity(i,j)将对应density(i)和temperature(j)的不透明度，并运行双线性插值以找到精确的不透明度。

这是我小组代码中的瓶颈 - 每个时间步，对于不同的密度和温度，插值程序大约被调用1亿次，并且它占运行时的大约20％。目前的代码如下所示 - 我可以使用任何技巧来改进它吗？我使用英特尔Fortran 16，选项-O3 -xavx -mcmodel=medium

function smoothopc(den, temp, ig, opc, rhoT)

    implicit none
    real(kind=8), intent(in) :: den,temp 
    integer,intent(in) :: ig
    real(kind=8), intent(in), dimension(1:50, 1:50, 1:52) :: opc
    real(kind=8), intent(in), dimension(1:50, 1:2) :: rhoT

    real(kind=8) :: rho, te, smoothopc, r1, r2, t1, t2, &
            interpolation, denominator, a, b, c, d, opc11, &
            opc12, opc21, opc22, t, tp, r, rp

    integer :: rid,tid,i

    rho = den * 1d - 3          !g/cc
    te = temp / (1.6d - 19)     !eV
    tid = -1
    rid = -1
    do i = 1, 49
        r = rhoT(i, 1)
        rp = rhoT(i + 1, 1)
        t = rhoT(i, 2)
        tp = rhoT(i + 1, 2)
        if (rho .ge. r) then
            rid = i
        endif

        if (te .ge. t) then
            tid = i
        endif
    enddo

    r1 = rhoT(rid, 1)
    r2 = rhoT(rid + 1, 1)
    t1 = rhoT(tid, 2)
    t2 = rhoT(tid + 1, 2)
    opc11 = opc(rid, tid, ig + 4)
    opc12 = opc(rid, tid + 1, ig + 4)
    opc21 = opc(rid + 1, tid, ig + 4)
    opc22 = opc(rid + 1, tid + 1, ig + 4)

    denominator = (r2 - r1) * (t2 - t1)
    a = r2 - rho
    b = rho - r1
    c = t2 - te
    d = te - t1

    interpolation = a * (c * opc11 + d * opc12) + b * &
            (c * opc21 + d * opc22)

    smoothopc = interpolation / denominator

    return

end function smoothopc

Answer 1

如果我理解正确，那么您正在遍历整个rhoT以查找之后用于在opc中查找值的索引。

如果对rhoT的列进行排序，则编写二进制搜索可能会更快（存在开销，因此您必须进行测试）。

另外，我不太了解你分配rid和tid的条件（只有在r <= rho < rp时才分配出去似乎合乎逻辑）。我可能会遗漏一些关于如何构建rhoT的内容。

您可以尝试的技巧：将rho*M转换为整数，其中M可以是例如2或10的幂（乘以1000以获得3位精度）。舍入为整数的值将是数组的索引，该数组的元素是rid的正确（或最接近）值。即使您没有得到正确的rid，您也可能需要检查的范围要小得多。如果比例不是线性的，您可以先转换rho。

另一个可能的技巧：存储以前的rid和tid索引。如果呼叫遵循相当持续的演变，新指数可能会接近前一个指数。但是如果你的代码必须在某些时候被分段化，那么这不是一个好主意，因为这会引入调用之间的顺序依赖。

Fortran中的高性能查找表

1 个答案: