这是我认为应该可行的事情,但我在HDL世界中如何做到这一点却失败了。目前我有一个继承的设计是对多维数组求和,但是我们必须预先写入加法块,因为其中一个维度是一个合成时选项,我们提供了相应的加法。
如果我有reg tap_out [src] [dst] [tap]之类的东西,其中src和dst设置为4并且tap可以在0到15之间(16种可能性),我希望能够分配输出[ dst]是该特定dst的所有tap_out的总和。
现在我们的求和块为每个src获取tap_out的所有组合,然后点击并为每个dst成对求和:
tap_out [0] [DST] [0]
tap_out [1] [DST] [0]
tap_out [2] [DST] [0]
tap_out [3] [DST] [0]
tap_out [0] [DST] [1]
....
tap_out [3] [DST] [15]
有没有办法在Verilog中做得更好?在C中我会使用一些for循环,但这似乎不可能。
答案 0 :(得分:2)
for循环工作得很好
integer src_idx, tap_idx;
always @* begin
sum = 0;
for (scr_idx=0; src_idx<4; src_idx=scr_idx+1) begin
for (tap_idx=0; tap_idx<16; tap_idx=tap_idx+1) begin
sum = sum + tap_out[src_idx][dst][tap_idx];
end
end
end
它在合成期间展开成一个大的组合逻辑,结果应该是相同的,逐行累加比特。
大型求和逻辑的传播延迟可能会出现时序问题。一个好的合成器应该在被告知时钟约束时找到最佳时序/区域。如果逻辑对于合成器来说过于复杂,那么添加可以并行运行的部分和逻辑
reg [`WIDHT-1:0] /*keep*/ partial_sum [3:0]; // tell synthesis to preserve these nets
integer src_idx, tap_idx;
always @* begin
sum = 0;
for (scr_idx=0; src_idx<4; src_idx=scr_idx+1) begin
partial_sum[scr_idx] = 0;
// partial sums are independent of each other so the can run in parallel
for (tap_idx=0; tap_idx<16; tap_idx=tap_idx+1) begin
partial_sum[scr_idx] = partial_sum[scr_idx] + tap_out[src_idx][dst][tap_idx];
end
sum = sum + partial_sum[scr_idx]; // sum the partial sums
end
end
如果时序问题仍然存在,那么您必须将逻辑视为多周期,并在输入更改后的某个时钟周期对值进行采样。
答案 1 :(得分:1)
在RTL(您可能使用HDL建模的抽象级别)中,您必须平衡并行性与时间的关系。通过并行处理,您可以节省时间(通常),但逻辑会占用大量空间。相反,你可以使添加完全连续(一次添加一个)并将结果存储在寄存器中(听起来你想积累总和,所以我会解释一下)。
听起来完全并行对你的用途来说并不实用(如果是,你想重写它,查找generate
语句)。因此,您需要创建一个小FSM并将总和累积到寄存器中。这是一个基本的例子,它总结了一个16位数字的数组(假设它们被设置在其他地方):
reg [15:0] arr[0:9]; // numbers
reg [31:0] result; // accumulated sum
reg load_result; // load signal for register containing result
reg clk, rst_L; // These are the clock and reset signals (reset asserted low)
/* This is a register for storing the result */
always @(posedge clk, negedge rst_L) begin
if (~rst_L) begin
result <= 32'd0;
end
else begin
if (load_result) begin
result <= next_result;
end
end
end
/* A counter for knowing which element of the array we are adding
reg [3:0] counter, next_counter;
reg load_counter;
always @(posedge clk, negedge rst_L) begin
if (~rst_L) begin
counter <= 4'd0;
end
else begin
if (load_counter) begin
counter <= counter + 4'd1;
end
end
end
/* Perform the addition */
assign next_result = result + arr[counter];
/* Define the state machine states and state variable */
localparam IDLE = 2'd0;
localparam ADDING = 2'd1;
localparam DONE = 2'd2;
reg [1:0] state, next_state;
/* A register for holding the current state */
always @(posedge clk, negedge rst_L) begin
if (~rst_L) begin
state <= IDLE;
end
else begin
state <= next_state;
end
end
/* The next state and output logic, this will control the addition */
always @(*) begin
/* Defaults */
next_state = IDLE;
load_result = 1'b0;
load_counter = 1'b0;
case (state)
IDLE: begin
next_state = ADDING; // Start adding now (right away)
end
ADDING: begin
load_result = 1'b1; // Load in the result
if (counter == 3'd9) begin // If we're on the last element, stop incrementing counter, we are done
load_counter = 1'b0;
next_state = DONE;
end
else begin // Otherwise, keep adding
load_counter = 1'b1;
next_state = ADDING;
end
end
DONE: begin // finished adding, result is in result!
next_state = DONE;
end
endcase
end
如果您在使用该概念时遇到问题,网上有很多资源可以解释FSM,但它们可用于实现您的基本C风格的循环。