Question

这是我认为应该可行的事情，但我在HDL世界中如何做到这一点却失败了。目前我有一个继承的设计是对多维数组求和，但是我们必须预先写入加法块，因为其中一个维度是一个合成时选项，我们提供了相应的加法。

如果我有reg tap_out [src] [dst] [tap]之类的东西，其中src和dst设置为4并且tap可以在0到15之间（16种可能性），我希望能够分配输出[ dst]是该特定dst的所有tap_out的总和。

现在我们的求和块为每个src获取tap_out的所有组合，然后点击并为每个dst成对求和：
tap_out [0] [DST] [0]
tap_out [1] [DST] [0]
tap_out [2] [DST] [0]
tap_out [3] [DST] [0]
tap_out [0] [DST] [1]
....
tap_out [3] [DST] [15]

有没有办法在Verilog中做得更好？在C中我会使用一些for循环，但这似乎不可能。

Answer 1

在这种情况下，

for循环工作得很好

integer src_idx, tap_idx;
always @* begin
  sum = 0;
  for (scr_idx=0; src_idx<4; src_idx=scr_idx+1) begin
    for (tap_idx=0; tap_idx<16; tap_idx=tap_idx+1) begin
      sum = sum + tap_out[src_idx][dst][tap_idx];
    end
  end
end

它在合成期间展开成一个大的组合逻辑，结果应该是相同的，逐行累加比特。

大型求和逻辑的传播延迟可能会出现时序问题。一个好的合成器应该在被告知时钟约束时找到最佳时序/区域。如果逻辑对于合成器来说过于复杂，那么添加可以并行运行的部分和逻辑

reg [`WIDHT-1:0] /*keep*/ partial_sum [3:0]; // tell synthesis to preserve these nets
integer src_idx, tap_idx;
always @* begin
  sum = 0;
  for (scr_idx=0; src_idx<4; src_idx=scr_idx+1) begin
    partial_sum[scr_idx] = 0;
    // partial sums are independent of each other so the can run in parallel
    for (tap_idx=0; tap_idx<16; tap_idx=tap_idx+1) begin
      partial_sum[scr_idx] = partial_sum[scr_idx] + tap_out[src_idx][dst][tap_idx];
    end
    sum = sum + partial_sum[scr_idx]; // sum the partial sums
  end
end

如果时序问题仍然存在，那么您必须将逻辑视为多周期，并在输入更改后的某个时钟周期对值进行采样。

Answer 2

在RTL（您可能使用HDL建模的抽象级别）中，您必须平衡并行性与时间的关系。通过并行处理，您可以节省时间（通常），但逻辑会占用大量空间。相反，你可以使添加完全连续（一次添加一个）并将结果存储在寄存器中（听起来你想积累总和，所以我会解释一下）。

听起来完全并行对你的用途来说并不实用（如果是，你想重写它，查找generate语句）。因此，您需要创建一个小FSM并将总和累积到寄存器中。这是一个基本的例子，它总结了一个16位数字的数组（假设它们被设置在其他地方）：

reg [15:0] arr[0:9]; // numbers
reg [31:0] result; // accumulated sum
reg load_result; // load signal for register containing result
reg clk, rst_L; // These are the clock and reset signals (reset asserted low)

/* This is a register for storing the result */
always @(posedge clk, negedge rst_L) begin
  if (~rst_L) begin
    result <= 32'd0;
  end
  else begin
    if (load_result) begin
      result <= next_result;
    end
  end
end

/* A counter for knowing which element of the array we are adding
reg [3:0] counter, next_counter;
reg load_counter;

always @(posedge clk, negedge rst_L) begin
  if (~rst_L) begin
    counter <= 4'd0;
  end
  else begin
    if (load_counter) begin
      counter <= counter + 4'd1;
    end
  end
end

/* Perform the addition */
assign next_result = result + arr[counter];

/* Define the state machine states and state variable */
localparam IDLE = 2'd0;
localparam ADDING = 2'd1;
localparam DONE = 2'd2;
reg [1:0] state, next_state;

/* A register for holding the current state */
always @(posedge clk, negedge rst_L) begin
  if (~rst_L) begin
    state <= IDLE;
  end
  else begin
    state <= next_state;
  end
end

/* The next state and output logic, this will control the addition */
always @(*) begin
  /* Defaults */
  next_state = IDLE;
  load_result = 1'b0;
  load_counter = 1'b0;

  case (state)
    IDLE: begin
      next_state = ADDING; // Start adding now (right away)
    end
    ADDING: begin
      load_result = 1'b1; // Load in the result
      if (counter == 3'd9) begin // If we're on the last element, stop incrementing counter, we are done
        load_counter = 1'b0;
        next_state = DONE;
      end
      else begin // Otherwise, keep adding
        load_counter = 1'b1;
        next_state = ADDING;
      end
    end
    DONE: begin // finished adding, result is in result!
      next_state = DONE;
    end
  endcase
end

如果您在使用该概念时遇到问题，网上有很多资源可以解释FSM，但它们可用于实现您的基本C风格的循环。

有没有办法在verilog中求和多维数组？

2 个答案: