你如何转换这个公式。
公式= sum(U'(W。(美国)))
Legend
' transpose of a matrix
* matrix-matrix multiplication
. scalar multiplication
来自python
u = np.random.rand(1000,10000)
s = np.random.rand(10000,1000)
w = np.random.rand(1000,1000)
start = time.time()
res = np.sum(u.T.dot(w * u.dot(s)))
print time.time()-start
在SystemML中使用带有以下数据的DML
u = np.random.rand(10000,100000)
s = np.random.rand(100000,10000)
w = np.random.rand(10000,10000)
语法
t(M) transpose of a matrix, where M is the matrix
%*% matrix-matrix multiplication
* scalar multiplication
任务
答案 0 :(得分:0)
好吧,你几乎可以直接写下你的原始表达式,如下所示。但请注意,如果没有消费者,这将导致死代码消除 - 因此打印:
U = rand(rows=1000, cols=10000);
S = rand(rows=10000, cols=1000);
W = rand(rows=1000, cols=1000);
print(sum(t(U) %*% (W * (U %*% S))))
这将编译为以下运行时计划:
PROGRAM
--MAIN PROGRAM
----GENERIC (lines 1-9) [recompile=false]
------(8) dg(rand) [1000,10000,1000,1000,10000000] [0,0,76 -> 76MB], CP
------(33) ua(+R) (8) [1000,1,1000,1000,-1] [76,0,0 -> 76MB], CP
------(40) r(t) (33) [1,1000,1000,1000,-1] [0,0,0 -> 0MB], CP
------(26) dg(rand) [1000,1000,1000,1000,1000000] [0,0,8 -> 8MB], CP
------(17) dg(rand) [10000,1000,1000,1000,10000000] [0,0,76 -> 76MB], CP
------(28) ba(+*) (8,17) [1000,1000,1000,1000,-1] [153,5,8 -> 165MB], CP
------(29) b(*) (26,28) [1000,1000,1000,1000,-1] [15,0,8 -> 23MB], CP
------(35) ua(+R) (29) [1000,1,1000,1000,-1] [8,0,0 -> 8MB], CP
------(38) ba(+*) (40,35) [1,1,1000,1000,-1] [0,0,0 -> 0MB], CP
------(39) u(cast_as_scalar) (38) [0,0,0,0,-1] [0,0,0 -> 0MB]
------(32) u(print) (39) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
这将对应于以下脚本级别表达式(通过最后一个矩阵乘以和转置推送最终总和):
print(as.scalar(t(rowSums(U)) %*% rowSums(W * (U %*% S))))
此外,不同的数据特征(维度和稀疏度)可能导致执行计划大不相同。例如,如果我们有一个稀疏的W和一个像矩阵乘积的外积
U = rand(rows=10000, cols=100);
S = rand(rows=100, cols=10000);
W = rand(rows=10000, cols=10000, sparsity=0.001);
我们得到以下执行计划
PROGRAM
--MAIN PROGRAM
----GENERIC (lines 5-9) [recompile=false]
------(8) dg(rand) [10000,100,1000,1000,1000000] [0,0,8 -> 8MB], CP
------(33) ua(+R) (8) [10000,1,1000,1000,-1] [8,0,0 -> 8MB], CP
------(43) r(t) (33) [1,10000,1000,1000,-1] [0,0,0 -> 0MB], CP
------(26) dg(rand) [10000,10000,1000,1000,100000] [0,0,2 -> 2MB], CP
------(17) dg(rand) [100,10000,1000,1000,1000000] [0,0,8 -> 8MB], CP
------(37) r(t) (17) [10000,100,1000,1000,1000000] [8,0,8 -> 15MB], CP
------(39) q(wdivmm) (26,8,37) [10000,10000,1000,1000,100000] [17,0,2 -> 19MB], CP
------(35) ua(+R) (39) [10000,1,1000,1000,-1] [2,0,0 -> 2MB], CP
------(41) ba(+*) (43,35) [1,1,1000,1000,-1] [0,0,0 -> 0MB], CP
------(42) u(cast_as_scalar) (41) [0,0,0,0,-1] [0,0,0 -> 0MB]
------(32) u(print) (42) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]
其中wdivmm(内置的,稀疏性利用,融合运算符)替换W * (U %*% S)
。此融合运算符仅为非零值Ui %*% Vj
计算必要的点积Wij
。通过为运算符融合启用代码生成(必须通过sysml.codegen.enabled=true
明确启用),后续的rowSums也将融合到此链中。
通过扩展,此自动编译也适用于分布式操作。例如,假设您有一个10GB最大堆的小驱动程序,其中包含16个虚拟核,以及以下情形:
U = rand(rows=1000000, cols=100);
S = rand(rows=100, cols=1000000);
W = rand(rows=1000000, cols=1000000, sparsity=0.001);
您最终会得到以下计划,其中标有SPARK的运营商将作为分布式运营运行,而标有CP的运营商(即控制程序)将在驱动程序中本地运行:
# Memory Budget local/remote = 6372MB/183420MB/220104MB/12839MB
# Degree of Parallelism (vcores) local/remote = 16/144
PROGRAM
--MAIN PROGRAM
----GENERIC (lines 5-9) [recompile=true]
------(8) dg(rand) [1000000,100,1000,1000,100000000] [0,0,763 -> 763MB], CP
------(33) ua(+R) (8) [1000000,1,1000,1000,-1] [763,15,8 -> 786MB], CP
------(43) r(t) (33) [1,1000000,1000,1000,-1] [8,0,8 -> 15MB], CP
------(26) dg(rand) [1000000,1000000,1000,1000,1000000000] [0,8,11524 -> 11532MB], SPARK
------(17) dg(rand) [100,1000000,1000,1000,100000000] [0,0,763 -> 763MB], CP
------(37) r(t) (17) [1000000,100,1000,1000,100000000] [763,0,763 -> 1526MB], CP
------(39) q(wdivmm) (26,8,37) [1000000,1000000,1000,1000,1000000000] [13050,0,11524 -> 24574MB], SPARK
------(35) ua(+R) (39) [1000000,1,1000,1000,-1] [11524,15,8 -> 11547MB], SPARK
------(41) ba(+*) (43,35) [1,1,1000,1000,-1] [15,0,0 -> 15MB], SPARK
------(42) u(cast_as_scalar) (41) [0,0,0,0,-1] [0,0,0 -> 0MB], CP
------(32) u(print) (42) [-1,-1,-1,-1,-1] [0,0,0 -> 0MB]