我有一些python代码,我试图移植到Julia学习这种可爱的语言。我在python中使用了生成器。在移植之后,在我看来(此时此刻)Julia在这个领域真的很慢!
我将部分代码简化为本练习:
想想4x4棋盘。找到每一个N-move长路,国际象棋王可以做到。在这个练习中,国王不允许在一条路径中的同一位置跳跃两次。不要浪费记忆 - >制作每条路径的发电机。
算法非常简单:
如果我们用数字签署每个位置:
0 1 2 3
4 5 6 7
8 9 10 11
12 13 14 16
点0有3个邻居(1,4,5)。我们可以为每个点找到每个邻居的表格:
NEIG = [[1, 4, 5], [0, 2, 4, 5, 6], [1, 3, 5, 6, 7], [2, 6, 7], [0, 1, 5, 8, 9], [0, 1, 2, 4, 6, 8, 9, 10], [1, 2, 3, 5, 7, 9, 10, 11], [2, 3, 6, 10, 11], [4, 5, 9, 12, 13], [4, 5, 6, 8, 10, 12, 13, 14], [5, 6, 7, 9, 11, 13, 14, 15], [6, 7, 10, 14, 15], [8, 9, 13], [8, 9, 10, 12, 14], [9, 10, 11, 13, 15], [10, 11, 14]]
PYTHON
一个递归函数(生成器),它从点列表或(生成器......)点的生成器放大给定路径:
def enlarge(path):
if isinstance(path, list):
for i in NEIG[path[-1]]:
if i not in path:
yield path[:] + [i]
else:
for i in path:
yield from enlarge(i)
函数(生成器)给出具有给定长度的每个路径
def paths(length):
steps = ([i] for i in range(16)) # first steps on every point on board
for _ in range(length-1):
nsteps = enlarge(steps)
steps = nsteps
yield from steps
我们可以看到有905776个长度为10的路径:
sum(1 for i in paths(10))
Out[89]: 905776
JULIA (this code是在我们的讨论{@ 3}}
期间由@gggg创建的const NEIG_py = [[1, 4, 5], [0, 2, 4, 5, 6], [1, 3, 5, 6, 7], [2, 6, 7], [0, 1, 5, 8, 9], [0, 1, 2, 4, 6, 8, 9, 10], [1, 2, 3, 5, 7, 9, 10, 11], [2, 3, 6, 10, 11], [4, 5, 9, 12, 13], [4, 5, 6, 8, 10, 12, 13, 14], [5, 6, 7, 9, 11, 13, 14, 15], [6, 7, 10, 14, 15], [8, 9, 13], [8, 9, 10, 12, 14], [9, 10, 11, 13, 15], [10, 11, 14]];
const NEIG = [n.+1 for n in NEIG_py]
function enlarge(path::Vector{Int})
(push!(copy(path),loc) for loc in NEIG[path[end]] if !(loc in path))
end
collect(enlarge([1]))
function enlargepaths(paths)
Iterators.Flatten(enlarge(path) for path in paths)
end
collect(enlargepaths([[1],[2]]))
function paths(targetlen)
paths = ([i] for i=1:16)
for newlen in 2:targetlen
paths = enlargepaths(paths)
end
paths
end
p = sum(1 for path in paths(10))
基准
在ipython中我们可以计时:
python 3.6.3:
%timeit sum(1 for i in paths(10))
1.25 s ± 15.1 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
julia 0.6.0
julia> @time sum(1 for path in paths(10))
2.690630 seconds (41.91 M allocations: 1.635 GiB, 11.39% gc time)
905776
Julia 0.7.0-DEV.0
julia> @time sum(1 for path in paths(10))
4.951745 seconds (35.69 M allocations: 1.504 GiB, 4.31% gc time)
905776
问题(S):
我们Julians是here:重要的是要注意基准代码不是为绝对最大性能而编写的(计算recursion_fibonacci(20)的最快代码是常量文字6765)。相反,基准测试用于测试每种语言中实现的相同算法和代码模式的性能。
在此基准测试中,我们使用相同的想法。对于封闭到生成器的数组而言,只需简单循环(numpy,numba,pandas或其他c-written和编译的python包都没有)
假设朱莉娅的发电机非常慢吗?
我们可以做些什么才能让它变得非常快?
答案 0 :(得分:6)
const NEIG_py = [[1, 4, 5], [0, 2, 4, 5, 6], [1, 3, 5, 6, 7], [2, 6, 7], [0, 1, 5, 8, 9], [0, 1, 2, 4, 6, 8, 9, 10], [1, 2, 3, 5, 7, 9, 10, 11], [2, 3, 6, 10, 11], [4, 5, 9, 12, 13], [4, 5, 6, 8, 10, 12, 13, 14], [5, 6, 7, 9, 11, 13, 14, 15], [6, 7, 10, 14, 15], [8, 9, 13], [8, 9, 10, 12, 14], [9, 10, 11, 13, 15], [10, 11, 14]];
const NEIG = [n.+1 for n in NEIG_py];
function expandto(n, path, targetlen)
length(path) >= targetlen && return n+1
for loc in NEIG[path[end]]
loc in path && continue
n = expandto(n, (path..., loc), targetlen)
end
n
end
function npaths(targetlen)
n = 0
for i = 1:16
path = (i,)
n = expandto(n, path, targetlen)
end
n
end
基准测试(在执行JIT编译后执行一次):
julia> @time npaths(10)
0.069531 seconds (5 allocations: 176 bytes)
905776
这要快得多。
答案 1 :(得分:5)
为了让Julia的生成器提前知道它们可能生成什么类型的类型,它们封装了有关它们执行的操作和它们在类型中迭代的对象的信息:
julia> (1 for i in 1:16)
Base.Generator{UnitRange{Int64},getfield(Main, Symbol("##27#28"))}(getfield(Main, Symbol("##27#28"))(), 1:16)
奇怪的##27#28
事件是一个简单返回1
的匿名函数的类型。当生成器到达LLVM时,它知道足以执行大量优化:
julia> function naive_sum(c)
s = 0
for elt in c
s += elt
end
s
end
@code_llvm naive_sum(1 for i in 1:16)
; Function naive_sum
; Location: REPL[1]:2
define i64 @julia_naive_sum_62385({ { i64, i64 } } addrspace(11)* nocapture nonnull readonly dereferenceable(16)) {
top:
; Location: REPL[1]:3
%1 = getelementptr inbounds { { i64, i64 } }, { { i64, i64 } } addrspace(11)* %0, i64 0, i32 0, i32 0
%2 = load i64, i64 addrspace(11)* %1, align 8
%3 = getelementptr inbounds { { i64, i64 } }, { { i64, i64 } } addrspace(11)* %0, i64 0, i32 0, i32 1
%4 = load i64, i64 addrspace(11)* %3, align 8
%5 = add i64 %4, 1
%6 = sub i64 %5, %2
; Location: REPL[1]:6
ret i64 %6
}
在那里解析LLVM IR可能需要一分钟,但您应该能够看到它只是提取UnitRange
(getelementptr
和load
)的端点,从彼此中减去它们(sub
)并添加一个来计算总和而不需要单个循环。
在这种情况下,它可以对抗朱莉娅:paths(10)
有一个非常复杂的类型!你正在迭代地将那个生成器包装在过滤器中并展平并且还有更多的生成器。事实上,它变得如此复杂,朱莉娅只是放弃试图弄清楚并决定与动态行为一起生活。而且在这一点上,它不再具有超越Python的固有优势 - 实际上专注于许多不同的类型,因为它递归遍历对象将是一个明显的障碍。您可以通过查看@code_warntype start(1 for i in paths(10))
来查看此操作。
我对朱莉娅表现的经验法则是type-stable,devectorized代码avoids allocations通常在C的2倍之内,动态,不稳定或矢量化代码在Python / MATLAB /其他更高级语言的一个数量级。通常它有点慢,因为其他更高级别的语言非常难以优化他们的情况,而Julia的大部分优化都集中在类型稳定的方面。这个深层嵌套的结构让你直接进入动态阵营。
朱莉娅的发电机也非常慢?本质上不是这样;只是当它们变得如此深深地嵌套时,你就会遇到这种不好的情况。
答案 2 :(得分:2)
不遵循相同的算法(并且不知道Python会像这样快速地做到这一点),但是使用以下代码,对于长度= 10的解决方案,Julia基本相同,并且对于长度= 16
In [48]: %timeit sum(1 for path in paths(10))
1.52 s ± 11.4 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
julia> @time sum(1 for path in pathsr(10))
1.566964 seconds (5.54 M allocations: 693.729 MiB, 16.24% gc time)
905776
In [49]: %timeit sum(1 for path in paths(16))
19.3 s ± 15.7 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
julia> @time sum(1 for path in pathsr(16))
6.491803 seconds (57.36 M allocations: 9.734 GiB, 33.79% gc time)
343184
这是代码。我昨天刚学会了任务/频道,所以可能会做得更好:
const NEIG = [[1, 4, 5], [0, 2, 4, 5, 6], [1, 3, 5, 6, 7], [2, 6, 7], [0, 1, 5, 8, 9], [0, 1, 2, 4, 6, 8, 9, 10], [1, 2, 3, 5, 7, 9, 10, 11], [2, 3, 6, 10, 11], [4, 5, 9, 12, 13], [4, 5, 6, 8, 10, 12, 13, 14], \
[5, 6, 7, 9, 11, 13, 14, 15], [6, 7, 10, 14, 15], [8, 9, 13], [8, 9, 10, 12, 14], [9, 10, 11, 13, 15], [10, 11, 14]];
function enlarger(num::Int,len::Int,pos::Int,sol::Array{Int64,1},c::Channel)
if pos == len
put!(c,copy(sol))
elseif pos == 0
for j=0:num
sol[1]=j
enlarger(num,len,pos+1,sol,c)
end
close(c)
else
for i in NEIG[sol[pos]+1]
if !in(i,sol[1:pos])
sol[pos+1]=i
enlarger(num,len,pos+1,sol,c)
end
end
end
end
function pathsr(len)
c=Channel(0)
sol = [0 for i=1:len]
@schedule enlarger(15,len,0,sol,c)
(i for i in c)
end
答案 3 :(得分:2)
按照tholy的回答,因为元组似乎非常快。这就像我之前的代码一样,但是有了元组的东西,它会得到更好的结果:
julia> @time sum(1 for i in pathst(10))
1.155639 seconds (1.83 M allocations: 97.632 MiB, 0.75% gc time)
905776
julia> @time sum(1 for i in pathst(16))
1.963470 seconds (1.39 M allocations: 147.555 MiB, 0.35% gc time)
343184
代码:
const NEIG = [[1, 4, 5], [0, 2, 4, 5, 6], [1, 3, 5, 6, 7], [2, 6, 7], [0, 1, 5, 8, 9], [0, 1, 2, 4, 6, 8, 9, 10], [1, 2, 3, 5, 7, 9, 10, 11], [2, 3, 6, 10, 11], [4, 5, 9, 12, 13], [4, 5, 6, 8, 10, 12, 13, 14], [5, 6, 7, 9, 11, 13, 14, 15], [6, 7, 10, 14, 15], [8, 9, 13], [8, 9, 10, 12, 14], [9, 10, 11, 13, 15], [10, 11, 14]];
function enlarget(path,len,c::Channel)
if length(path) >= len
put!(c,path)
else
for loc in NEIG[path[end]+1]
loc in path && continue
enlarget((path..., loc), len,c)
end
if length(path) == 1
path[1] == 15 ? close(c) : enlarget((path[1]+1,),len,c)
end
end
end
function pathst(len)
c=Channel(0)
path=(0,)
@schedule enlarget(path,len,c)
(i for i in c)
end
答案 4 :(得分:2)
由于每个人都在写答案......这是另一个版本,这次是使用Iterators,它比当前Julia(0.6.1)中的生成器更加惯用。迭代器提供了发电机的许多好处。迭代器定义如下:
import Base.Iterators: start, next, done, eltype, iteratoreltype, iteratorsize
struct SAWsIterator
neigh::Vector{Vector{Int}}
pathlen::Int
pos::Int
end
SAWs(neigh, pathlen, pos) = SAWsIterator(neigh, pathlen, pos)
start(itr::SAWsIterator) =
([itr.pos ; zeros(Int, itr.pathlen-1)], Vector{Int}(itr.pathlen-1),
2, Ref{Bool}(false), Ref{Bool}(false))
@inline next(itr::SAWsIterator, s) =
( s[4][] ? s[4][] = false : calc_next!(itr, s) ;
(s[1], (s[1], s[2], itr.pathlen, s[4], s[5])) )
@inline done(itr::SAWsIterator, s) = ( s[4][] || calc_next!(itr, s) ; s[5][] )
function calc_next!(itr::SAWsIterator, s)
s[4][] = true ; s[5][] = false
curindex = s[3]
pathlength = itr.pathlen
path, options = s[1], s[2]
@inbounds while curindex<=pathlength
curindex == 1 && ( s[5][] = true ; break )
startindex = path[curindex] == 0 ? 1 : options[curindex-1]+1
path[curindex] = 0
i = findnext(x->!(x in path), neigh[path[curindex-1]], startindex)
if i==0
path[curindex] = 0 ; options[curindex-1] = 0 ; curindex -= 1
else
path[curindex] = neigh[path[curindex-1]][i]
options[curindex-1] = i ; curindex += 1
end
end
return nothing
end
eltype(::Type{SAWsIterator}) = Vector{Int}
iteratoreltype(::Type{SAWsIterator}) = Base.HasEltype()
iteratorsize(::Type{SAWsIterator}) = Base.SizeUnknown()
剪切并粘贴上面的定义有效。 SAW一词被用作Self Avoiding Walk的首字母缩写,有时在数学中用于这样的路径。
现在,要使用/测试此迭代器,可以执行以下代码:
allSAWs(neigh, pathlen) =
Base.Flatten(SAWs(neigh,pathlen,k) for k in eachindex(neigh))
iterlength(itr) = mapfoldl(x->1, +, 0, itr)
using Base.Test
const neigh = [[2, 5, 6], [1, 3, 5, 6, 7], [2, 4, 6, 7, 8], [3, 7, 8],
[1, 2, 6, 9, 10], [1, 2, 3, 5, 7, 9, 10, 11], [2, 3, 4, 6, 8, 10, 11, 12],
[3, 4, 7, 11, 12], [5, 6, 10, 13, 14], [5, 6, 7, 9, 11, 13, 14, 15],
[6, 7, 8, 10, 12, 14, 15, 16], [7, 8, 11, 15, 16], [9, 10, 14],
[9, 10, 11, 13, 15], [10, 11, 12, 14, 16], [11, 12, 15]]
@test iterlength(allSAWs(neigh, 10)) == 905776
for (i,path) in enumerate(allSAWs(neigh, 10))
if i % 100_000 == 0
@show i,path
end
end
@time iterlength(allSAWs(neigh, 10))
它相对可读,输出如下:
(i, path) = (100000, [2, 5, 10, 14, 9, 6, 7, 12, 15, 11])
(i, path) = (200000, [4, 3, 8, 7, 6, 10, 14, 11, 16, 15])
(i, path) = (300000, [5, 10, 11, 16, 15, 14, 9, 6, 7, 3])
(i, path) = (400000, [8, 3, 6, 5, 2, 7, 11, 14, 15, 10])
(i, path) = (500000, [9, 14, 10, 5, 2, 3, 8, 11, 6, 7])
(i, path) = (600000, [11, 16, 15, 14, 10, 6, 3, 8, 7, 12])
(i, path) = (700000, [13, 10, 15, 16, 11, 6, 2, 1, 5, 9])
(i, path) = (800000, [15, 11, 12, 7, 2, 3, 6, 1, 5, 9])
(i, path) = (900000, [16, 15, 14, 9, 5, 10, 7, 8, 12, 11])
0.130755 seconds (4.16 M allocations: 104.947 MiB, 11.37% gc time)
905776
0.13s并不算太糟糕,因为这不像@tholy的答案或其他人那样优化。其他答案中使用的一些技巧在这里故意不使用,特别是:
在答案中没有看到的优化可能很重要的是使用有效的Bool数组或Dict来加速检查路径中是否已经使用了顶点。在这个答案中,findnext
触发了一个分配,这可以避免,然后这个答案将更接近最小内存分配数。
答案 5 :(得分:1)
这是我快速而又肮脏的作弊实验(我答应将其添加到评论中),我试图加速Angel的代码:
const NEIG_py = [[1, 4, 5], [0, 2, 4, 5, 6], [1, 3, 5, 6, 7], [2, 6, 7], [0, 1, 5, 8, 9], [0, 1, 2, 4, 6, 8, 9, 10], [1, 2, 3, 5, 7, 9, 10, 11], [2, 3, 6, 10, 11], [4, 5, 9, 12, 13], [4, 5, 6, 8, 10, 12, 13, 14], [5, 6, 7, 9, 11, 13, 14, 15], [6, 7, 10, 14, 15], [8, 9, 13], [8, 9, 10, 12, 14], [9, 10, 11, 13, 15], [10, 11, 14]];
const NEIG = [n.+1 for n in NEIG_py]
function enlargetc(path,len,c::Function)
if length(path) >= len
c(path)
else
for loc in NEIG[path[end]]
loc in path && continue
enlargetc((path..., loc), len,c)
end
if length(path) == 1
if path[1] == 16 return
else enlargetc((path[1]+1,),len,c)
end
end
end
end
function get_counter()
let helper = 0
function f(a)
helper += 1
return helper
end
return f
end
end
counter = get_counter()
@time enlargetc((1,), 10, counter) # 0.481986 seconds (2.62 M allocations: 154.576 MiB, 5.12% gc time)
counter.helper.contents # 905776
编辑:评论中的时间没有重新编译!重新编译后,它是0.201669 seconds (2.53 M allocations: 150.036 MiB, 10.77% gc time)
。