快速计算条件函数的方法

时间:2017-05-31 06:09:43

标签: python performance numpy

计算函数的最快方法是什么,如

Sub InsertFileName()

Dim strFolderPath As String
Dim lngLastRow As Long
Dim FileName As String
Dim WorkBk As Workbook
Dim ErrNumbers As Integer

'Choose folder with Excel files
strFolderPath = GetFolder(ThisWorkbook.Path) & "\"

'Loop through all Excel files in FolderPath
FileName = Dir(strFolderPath & "*.xl*")
Do While FileName <> ""

    'Open Excel file
    Set WorkBk = Workbooks.Open(strFolderPath & FileName)

    'Find the last row in A column
    On Error Resume Next
    lngLastRow = Application.WorksheetFunction.CountA(WorkBk.Sheets(1).Range("A:A")) + 1
    If lngLastRow = 1 Then
        ErrNumbers = ErrNumbers + 1
        Err.Clear
        GoTo NextWkb
    End If

    WorkBk.Sheets(1).Range("A" & lngLastRow).Value = WorkBk.Name
NextWkb:
        'Close file and save changes
        WorkBk.Close True
        'Next file
        FileName = Dir()
    Loop

If ErrNumbers <> 0 Then
    MsgBox "There were some problems with Excel files. Check if there is some empty sheet or empty A column in one or more Excel files and try again"
Else
    MsgBox "Everything went fine!"
End If


End Sub


Function GetFolder(strPath As String) As String
Dim fldr As FileDialog
Dim sItem As String
Set fldr = Application.FileDialog(msoFileDialogFolderPicker)
With fldr
    .Title = "Select a Folder"
    .AllowMultiSelect = False
    .InitialFileName = strPath
    If .Show <> -1 Then GoTo NextCode
    sItem = .SelectedItems(1)
End With
NextCode:
GetFolder = sItem
Set fldr = Nothing
End Function

一种可能的方法是:

# here x is just a number
def f(x):
    if x >= 0:
        return np.log(x+1)
    else:
        return -np.log(-x+1)

但似乎numpy逐个元素地遍历数组。 有没有办法使用概念上类似于np.exp(x)的东西来获得更好的性能?

4 个答案:

答案 0 :(得分:6)

    def f(x):
        return (x/abs(x)) * np.log(1+abs(x))

答案 1 :(得分:3)

在这些情况下,masking有帮助 -

def mask_vectorized_app(x):
    out = np.empty_like(x)
    mask = x>=0
    mask_rev = ~mask
    out[mask] = np.log(x[mask]+1)
    out[mask_rev] = -np.log(-x[mask_rev]+1)
    return out

numexpr module简介有助于我们进一步发展。

import numexpr as ne

def mask_vectorized_numexpr_app(x):
    out = np.empty_like(x)
    mask = x>=0
    mask_rev = ~mask

    x_masked = x[mask]
    x_rev_masked = x[mask_rev]
    out[mask] = ne.evaluate('log(x_masked+1)')
    out[mask_rev] = ne.evaluate('-log(-x_rev_masked+1)')
    return out

@user2685079's post的启发,然后使用对数运算属性log(A**B) = B*log(A),我们可以将符号推入日志计算中,这样我们就可以使用numexpr的评估做更多的工作表达式,如此 -

s = (-2*(x<0))+1 # np.sign(x)
out = ne.evaluate('log( (abs(x)+1)**s)')

使用比较计算sign以另一种方式向我们提供s -

s = (-2*(x<0))+1

最后,我们可以将其推入numexpr评估表达式 -

def mask_vectorized_numexpr_app2(x):
    return ne.evaluate('log( (abs(x)+1)**((-2*(x<0))+1))')

运行时测试

用于比较的Loopy方法 -

def loopy_app(x):
    out = np.empty_like(x)
    for i in range(len(out)):
        out[i] = f(x[i])
    return out

计时和验证 -

In [141]: x = np.random.randn(100000)
     ...: print np.allclose(loopy_app(x), mask_vectorized_app(x))
     ...: print np.allclose(loopy_app(x), mask_vectorized_numexpr_app(x))
     ...: print np.allclose(loopy_app(x), mask_vectorized_numexpr_app2(x))
     ...: 
True
True
True

In [142]: %timeit loopy_app(x)
     ...: %timeit mask_vectorized_numexpr_app(x)
     ...: %timeit mask_vectorized_numexpr_app2(x)
     ...: 
10 loops, best of 3: 108 ms per loop
100 loops, best of 3: 3.6 ms per loop
1000 loops, best of 3: 942 µs per loop

使用@user2685079's solution使用np.sign替换第一部分,然后使用和不使用numexpr评估 -

In [143]: %timeit np.sign(x) * np.log(1+abs(x))
100 loops, best of 3: 3.26 ms per loop

In [144]: %timeit np.sign(x) * ne.evaluate('log(1+abs(x))')
1000 loops, best of 3: 1.66 ms per loop

答案 2 :(得分:2)

Using numba

  

Numba使用直接用Python编写的高性能函数为您提供加速应用程序的能力。通过一些注释,面向数组和数学的Python代码可以及时编译为本机机器指令,性能类似于C,C ++和Fortran,无需切换语言或Python解释器。

     

Numba的工作原理是在导入时,运行时或静态使用LLVM编译器基础结构生成优化的机器代码(使用附带的pycc工具)。 Numba支持在CPU或GPU硬件上运行Python的编译,并且旨在与Python科学软件堆栈集成。

     

Numba项目得到Continuum Analytics和The Gordon and Betty Moore Foundation(Grant GBMF5423)的支持。

from numba import njit
import numpy as np

@njit
def pir(x):
    a = np.empty_like(x)
    for i in range(a.size):
        x_ = x[i]
        _x = abs(x_)
        a[i] = np.sign(x_) * np.log(1 + _x)
    return a

准确性

np.isclose(pir(x), f(x)).all()

True

计时

x = np.random.randn(100000)

# My proposal
%timeit pir(x)
1000 loops, best of 3: 881 µs per loop

# OP test
%timeit f(x)
1000 loops, best of 3: 1.26 ms per loop

# Divakar-1
%timeit mask_vectorized_numexpr_app(x)
100 loops, best of 3: 2.97 ms per loop

# Divakar-2
%timeit mask_vectorized_numexpr_app2(x)
1000 loops, best of 3: 621 µs per loop

功能定义

from numba import njit
import numpy as np

@njit
def pir(x):
    a = np.empty_like(x)
    for i in range(a.size):
        x_ = x[i]
        _x = abs(x_)
        a[i] = np.sign(x_) * np.log(1 + _x)
    return a

import numexpr as ne

def mask_vectorized_numexpr_app(x):
    out = np.empty_like(x)
    mask = x>=0
    mask_rev = ~mask

    x_masked = x[mask]
    x_rev_masked = x[mask_rev]
    out[mask] = ne.evaluate('log(x_masked+1)')
    out[mask_rev] = ne.evaluate('-log(-x_rev_masked+1)')
    return out

def mask_vectorized_numexpr_app2(x):
    return ne.evaluate('log( (abs(x)+1)**((-2*(x<0))+1))')


def f(x):
    return (x/abs(x)) * np.log(1+abs(x))

答案 3 :(得分:1)

使用np.where代替np.select,您可以稍微提高第二个解决方案的速度:

$<
AllLines = $GEDIT_SELECTED_TEXT.split('\n')
if AllLines == ['']:
    AllLines = $GEDIT_CURRENT_LINE.split('\n')
NewLines = []
for line in AllLines:
    if line[0:2] == '# ':
        NewLines += [line.replace('# ', '')]
    elif line[0] == '#':
        NewLines += [line.replace('#', '')]
    elif line[0] != '#':
        NewLines += ['# ' + line]
return '\n'.join(NewLines)
>