请考虑以下矩阵乘数,其中输出C(4x3)是2个输入矩阵A(4x5)和B(5x3)的乘积。
对矩阵乘法进行精细流水线处理,以便在每个循环中产生乘积aij * bjk并将其添加到位置P(i,k)的部分乘积cik中。每5个时钟周期产生一次完整的cik。假设aij,bjk和cik项都是32位宽的整数。
1)写入(a)P(i,k)块的VHDL代码,(b)从RAM读取A,B并将C写入其中的RAM –该RAM具有2个读取端口和1个写入端口, c)FIFO(延迟aij和bjk项的应用,并延迟cik项写入RAM)。
2)为上图所示的P块的连接排列编写顶层VHDL代码,并在正确的时序下为aij,bjk和cik项的应用插入适当的FIFO。保持矩阵A,B和C的RAM也应该是此顶层设计的一部分。
3)为此编写一个测试平台。
这是我所拥有的: mat_ply.vhd
library IEEE;
use IEEE.STD_LOGIC_1164.all;
use ieee.numeric_std.all;
package mat_ply is
type t11 is array (0 to 4) of unsigned(15 downto 0);
type t1 is array (0 to 3) of t11; --4*5 matrix
type t22 is array (0 to 2) of unsigned(15 downto 0);
type t2 is array (0 to 4) of t22; --5*3 matrix
type t33 is array (0 to 2) of unsigned(31 downto 0);
type t3 is array (0 to 3) of t33; --4*3 matrix as output
function matmul ( a : t1; b:t2 ) return t3;
end mat_ply;
package body mat_ply is
function matmul ( a : t1; b:t2 ) return t3 is
variable i,j,k : integer:=0;
variable prod : t3:=(others => (others => (others => '0')));
begin
for i in 0 to 3 loop --(number of rows in the first matrix - 1)
for j in 0 to 2 loop --(number of columns in the second matrix - 1)
for k in 0 to 4 loop --(number of rows in the second matrix - 1)
prod(i)(j) := prod(i)(j) + (a(i)(k) * b(k)(j));
end loop;
end loop;
end loop;
return prod;
end matmul;
end mat_ply;
和TB
LIBRARY ieee;
USE ieee.std_logic_1164.ALL;
USE ieee.numeric_std.ALL;
library work;
use work.mat_ply.all;
ENTITY mat_tb IS
END mat_tb;
ARCHITECTURE behavior OF mat_tb IS
--signals declared and initialized to zero.
signal clk : std_logic := '0';
signal a : t1:=(others => (others => (others => '0')));
signal b : t2:=(others => (others => (others => '0')));
signal x: unsigned(15 downto 0):=(others => '0'); --temporary variable
signal prod : t3:=(others => (others => (others => '0')));
-- Clock period definitions
constant clk_period : time := 1 ns;
BEGIN
-- Instantiate the Unit Under Test (UUT)
uut: entity work.test_mat PORT MAP (clk,a,b,prod);
-- Clock process definitions
clk_process :process
begin
clk <= '0';
wait for clk_period/2;
clk <= '1';
wait for clk_period/2;
end process;
-- Stimulus process
stim_proc: process
begin
--first set of inputs..
a <= ((x,x+1,x+2,x+3,x+4),(x+2,x,x+1,x,x),(x+1,x+5,x,x,x),(x+1,x+1,x,x,x));
b <= ((x,x+1,x+4),(x,x+1,x+3),(x,x+2,x+3),(x,x+1,x+3),(x,x+1,x+3));
wait for 2 ns;
--second set of inputs can be given here and so on.
end process;
END;
我没有收到任何错误,但是我不知道我的代码是否正确。