Verilog综合实现cholesky分解

时间:2017-03-19 05:21:05

标签: python verilog synthesis matrix-decomposition

我在verilog中实现Cholesky decomposition,遵循下面的python代码

def cholesky(A):
    n = len(A)

    L = [[0.0] * n for i in xrange(n)]

    for i in xrange(n):
        for j in xrange(i+1):
            tmp_sum = sum(L[i][k] * L[j][k] for k in xrange(j))

            if (i == j): # Diagonal element
                L[i][j] = sqrt(A[i][i] - tmp_sum)
            else:
                L[i][j] = (1.0/L[j][j] * (A[i][j] - tmp_sum))
    return L

我尝试用3x3输入大小做一个简单的。由于它需要除法和平方根,我还使用标准方法(从带有一些修改的互联网复制)和使用Babylonian method(牛顿方法的变体)的sqrt编写一个除法。他们在这里:

module Div(in1, in2, out);
input [23:0] in1, in2;
output reg [23:0] out;
// reg [23:0] remainder;

reg [47:0] scaled_divider, temp_remainder, temp_result;
integer i;

always @ (in1 or in2) begin
    scaled_divider = {1'b0, in2, 23'h0};
    temp_remainder = {24'h0, in1};

    for (i=0; i<24; i=i+1) begin
        temp_result = temp_remainder - scaled_divider;

        if (temp_result[47-i]) begin    // Negative result, quotient set to '0'
            out[23-i] = 1'b0;
        end else begin
            out[23-i] = 1'b1;
            temp_remainder = temp_result;
        end 

        scaled_divider = scaled_divider >> 1;
    end 

    // remainder =  temp_remainder[23:0];
end 

endmodule   

的Sqrt

module Sqrt_newton(in, out);

// 3 iterations
input [23:0] in; 
output reg [23:0] out;

Div div1(in, out, tmp_inout1);
Div div2(in, tmp_inout2, tmp_inout3);
Div div3(in, tmp_inout4, tmp_inout5);


always @ (in)
begin
    out[0] = 1'b1;
    out[1] = 1'b1;
    out[2] = 1'b1;
    out[3] = 1'b1;
    out[4] = 1'b1;
    out[5] = 1'b1;
    out[6] = 1'b1;
    out[7] = 1'b1;
    tmp_inout2 = (out + tmp_inout1) >> 1;
    tmp_inout4 = (tmp_inout2 + tmp_inout3) >> 1;
    out = (tmp_inout4 + tmp_inout5) >> 1;
end 
endmodule

这是我的3x3 cholesky分解代码:

module cholesky_template(clk, rst, g_input, e_input, o);
    input clk, rst;
    input [143:0] g_input;
    input e_input;
    output [215:0] o;
    reg [23:0] L [0:2][0:2];
    reg [23:0] A [0:2][0:2] ;

    assign o = {
        L[0][0], L[0][1], L[0][2],
        L[1][0], L[1][1], L[1][2],
        L[2][0], L[2][1], L[2][2]
        };

    reg [23:0] tmp_A00_minus_sum;
    reg [23:0] tmp_A11_minus_sum;
    reg [23:0] tmp_A22_minus_sum

    reg [23:0] tmp_A10_minus_sum;
    reg [23:0] tmp_A20_minus_sum;
    reg [23:0] tmp_A21_minus_sum;

    reg [23:0] div_1_L00;
    reg [23:0] div_1_L11;

    Sqrt sqrt0(tmp_A00_minus_sum, L[0][0]);
    Div div0(1'b1, L[0][0], div_1_L00);
    Sqrt sqrt1(tmp_A11_minus_sum, L[1][1]);
    Div div1(1'b1, L[1][1], div_1_L11);
    Sqrt sqrt2(tmp_A22_minus_sum, L[2][2]);

    always @ (posedge clk or posedge rst) begin
        if (rst)
            L[0][0] = 1'b0;
            L[0][1] = 1'b0;
            L[0][2] = 1'b0;
            L[1][0] = 1'b0;
            L[1][1] = 1'b0;
            L[1][2] = 1'b0;
            L[2][0] = 1'b0;
            L[2][1] = 1'b0;
            L[2][2] = 1'b0;
            tmp_sum = 1'b0;
            A[0][0] ={8'b00000000, g_input[15:0]};
            A[0][1] =24'b0; // will not be used
            A[0][2] =24'b0; // will not be used
            A[1][0] ={8'b00000000, g_input[63:48]};
            A[1][1] ={8'b00000000, g_input[79:64]};
            A[1][2] =24'b0; // will not be used
            A[2][0] ={8'b00000000, g_input[111:96]};
            A[2][1] ={8'b00000000, g_input[127:112]};
            A[2][2] ={8'b00000000, g_input[143:128]};
        end else begin
            tmp_A00_minus_sum = A[0][0] - tmp_sum;

            tmp_A10_minus_sum = A[1][0] - tmp_sum;
            L[1][0] = div_1_L00 * tmp_A10_minus_sum;

            tmp_sum = tmp_sum + L[1][0] * L[1][0];

            tmp_A11_minus_sum = A[1][1] - tmp_sum;

            tmp_A20_minus_sum = A[2][0] - tmp_sum;
            L[2][0] = div_1_L00 * tmp_A20_minus_sum;            

            tmp_sum = tmp_sum + L[2][0] * L[1][0];

            tmp_A21_minus_sum = A[2][1] - tmp_sum;
            L[2][1] = div_1_L11 * tmp_A21_minus_sum;

            tmp_sum = tmp_sum + L[2][0] * L[2][0];
            tmp_sum = tmp_sum + L[2][1] * L[2][1];

            tmp_A22_minus_sum = A[2][2] - tmp_sum;
        end
    end
endmodule

关于代码的一些解释:我没有使用for循环,因此我将它们展开为tmp_A10_minus_sum = A[1][0] - tmp_sum;之类的东西。映射到python代码应该相当容易。在A之前插入8个零的原因是我会尝试升级&#34;代码使用24位,以便它可以更准确。这不是问题。

三州巴士警告

问题是当我使用Synopsys DC编译它时,它会输出如下警告:

  

&#34;警告:在设计&#39; cholesky_template&#39;中,三态总线&#39; tmp_A00_minus_sum [23]&#39;有非三态驱动程序&#39; tmp_A00_minus_sum_reg [23] / Q&#39;。 (LINT-34)&#34;

这是DC对LINT-34的描述:

  

NAME         LINT-34(警告)在设计&#39;%s&#39;中,三态总线&#39;%s&#39;有三个 -          州司机&#39;%s&#39;。

     

说明         Synopsys库包含三态驱动引脚的描述          组件。 Synopsys工具将网络分类为三态网络          由至少一个具有这种三态属性的引脚驱动。          通常,如果此类网络上有多个驱动程序,则假定为          所有驱动销应为三态驱动器,以便正确操作          三国巴士。此警告消息表明情况          至少有一个非三态驱动器出现在三态上          净。

     

WHAT_NEXT          验证这是您对给定网络的预期。如果          消息中指定的非三态驱动程序引脚确实在a上          在您的ASIC技术中使用三态驱动器,验证该技术          图书馆描述是正确的。

为什么设计中存在三态属性?我该如何纠正它们?

目标库不包含寄存器

的替代品

这是我得到的另一个警告,例如:

  

警告:目标库不包含寄存器&#39; A_reg [1] [0] [7]&#39; (的 FFGEN )。 (TRANS-4)

这是我的图书馆代码,我想知道这是否与三态总线警告有关?如果是这样,有没有提到设计合适的细胞?

library(HML){
cell(AND)  {
  area: 6;
  pin(A) {
      direction: input;
      capacitance: 1;
  }    
  pin(B) {
      direction: input;
      capacitance: 1;  
    }
  pin(Z) {
    direction: output;
    function: "A B";
    timing() {
        intrinsic_rise: 0.48;
        intrinsic_fall: 0.77;
        rise_resistance: 0.1443;
        fall_resistance: 0.0523;
        slope_rise: 0.0;
        slope_fall: 0.0;
        related_pin: "A";   
        }
    timing() {
        intrinsic_rise: 0.48;
        intrinsic_fall: 0.77;
        rise_resistance: 0.1443;
        fall_resistance: 0.0523;
        slope_rise: 0.0;
        slope_fall: 0.0;
        related_pin: "B";   
        }
    }
  }
cell(OR) {
  area:  6;
  pin(A) {
    direction: input;
    capacitance: 1;
  }
  pin(B) {
    direction: input;
    capacitance: 1;
  }
  pin(Z) {
    direction: output;
    function: "A+B";
    timing() {
        intrinsic_rise: 0.28;
        intrinsic_fall: 0.85;
        rise_resistance: 0.1443;
        fall_resistance: 0.0589;
        slope_rise: 0.0;
        slope_fall: 0.0;
        related_pin: "A";   
    }
    timing() {
        intrinsic_rise: 0.28;
        intrinsic_fall: 0.85;
        rise_resistance: 0.1443;
        fall_resistance: 0.0589;
        slope_rise: 0.0;
        slope_fall: 0.0;
        related_pin: "B";   
    }
  }
}
cell(XOR) {
  area: 0;
  pin(A) {
    direction: input;
    capacitance: 1;
  }
  pin(B) {
    direction: input;
    capacitance: 1
  }
  pin(Z) {
    direction: output;
    function: "A^B";
    timing() {
        intrinsic_rise: 0.28;
        intrinsic_fall: 0.85;
        rise_resistance: 0.1443;
        fall_resistance: 0.0589;
        slope_rise: 0.0;
        slope_fall: 0.0;
        related_pin: "A";   
    }
    timing() {
        intrinsic_rise: 0.28;
        intrinsic_fall: 0.85;
        rise_resistance: 0.1443;
        fall_resistance: 0.0589;
        slope_rise: 0.0;
        slope_fall: 0.0;
        related_pin: "B";   
    }
  }
}
cell(NAND) {
  area: 6;
  pin(A) {
    direction: input;
    capacitance: 1;
  }
  pin(B) {
    direction: input;
    capacitance: 1
  }
  pin(Z) {
    direction: output;
    function: "(A B)'";
    timing() {
        intrinsic_rise: 0.28;
        intrinsic_fall: 0.85;
        rise_resistance: 0.1443;
        fall_resistance: 0.0589;
        slope_rise: 0.0;
        slope_fall: 0.0;
        related_pin: "A";   
    }
    timing() {
        intrinsic_rise: 0.28;
        intrinsic_fall: 0.85;
        rise_resistance: 0.1443;
        fall_resistance: 0.0589;
        slope_rise: 0.0;
        slope_fall: 0.0;
        related_pin: "B";   
    }
  }
}
cell(NOR) {
  area: 6;
  pin(A) {
    direction: input;
    capacitance: 1;
  }
  pin(B) {
    direction: input;
    capacitance: 1
  }
  pin(Z) {
    direction: output;
    function: "(A+B)'";
    timing() {
        intrinsic_rise: 0.28;
        intrinsic_fall: 0.85;
        rise_resistance: 0.1443;
        fall_resistance: 0.0589;
        slope_rise: 0.0;
        slope_fall: 0.0;
        related_pin: "A";   
    }
    timing() {
        intrinsic_rise: 0.28;
        intrinsic_fall: 0.85;
        rise_resistance: 0.1443;
        fall_resistance: 0.0589;
        slope_rise: 0.0;
        slope_fall: 0.0;
        related_pin: "B";   
    }
  }
}

cell(XNOR) {
  area: 6;
  pin(A) {
    direction: input;
    capacitance: 1;
  }
  pin(B) {
    direction: input;
    capacitance: 1
  }
  pin(Z) {
    direction: output;
    function: "(A^B)'";
    timing() {
        intrinsic_rise: 0.28;
        intrinsic_fall: 0.85;
        rise_resistance: 0.1443;
        fall_resistance: 0.0589;
        slope_rise: 0.0;
        slope_fall: 0.0;
        related_pin: "A";   
    }
    timing() {
        intrinsic_rise: 0.28;
        intrinsic_fall: 0.85;
        rise_resistance: 0.1443;
        fall_resistance: 0.0589;
        slope_rise: 0.0;
        slope_fall: 0.0;
        related_pin: "B";   
    }
  }
}

cell(DFF) {
  area : 9;
  pin(D) {
    direction : input;
    capacitance : 1;
    timing() {
      timing_type : setup_rising;
      intrinsic_rise : 0.85;
      intrinsic_fall : 0.85;
      related_pin : "CLK";
    }
    timing() {
      timing_type : hold_rising;
      intrinsic_rise : 0.4;
      intrinsic_fall : 0.4;
      related_pin : "CLK";
    }
  }
    pin(I) {
    direction : input;
    capacitance : 1;
    timing() {
      timing_type : setup_rising;
      intrinsic_rise : 0.85;
      intrinsic_fall : 0.85;
      related_pin : "CLK";
    }
    timing() {
      timing_type : hold_rising;
      intrinsic_rise : 0.4;
      intrinsic_fall : 0.4;
      related_pin : "CLK";
    }
  }
  pin(CLK) {
    direction : input;
    capacitance : 1;
  }
  pin(RST) {
    direction : input;
    capacitance : 2;
  }

  ff("IQ", "IQN") {
    next_state : "D";
    clocked_on : "CLK";
    clear : "RST (I')";
    preset: "RST I";
    clear_preset_var1: L;
    clear_preset_var2: H;
  }

  pin(Q) {
    direction : output;
    function : "IQ";
    internal_node : "Q";
    timing() {
      timing_type : rising_edge;
      intrinsic_rise : 1.19;
      intrinsic_fall : 1.37;
      rise_resistance : 0.1458;
      fall_resistance : 0.0523;
      related_pin : "CLK";
    }
    timing() {
      timing_type : clear;
      timing_sense : positive_unate;
      intrinsic_fall : 1.29;
      fall_resistance : 0.0516;
      related_pin : "RST";
    }
    timing() {
      timing_type : preset;
      timing_sense : positive_unate;
      intrinsic_fall : 1.29;
      fall_resistance : 0.0516;
      related_pin : "I";
    }
  }
}
cell(IV){
  area:0;
  cell_footprint : "iv";
  pin(A) {
    direction: input;
    capacitance: 1;
  }
  pin(Z) {
    direction: output;
    function : "A'";
    timing() {
      intrinsic_rise : 0.38;
      intrinsic_fall : 0.15;
      rise_resistance : 0.1443;
      fall_resistance : 0.0589;
      slope_rise : 0.0;
      slope_fall     : 0.0;
      related_pin : "A";
    }
  }
}
}

很抱歉这是一篇很长的帖子。我希望我能清楚地问我的问题。

1 个答案:

答案 0 :(得分:0)

这已经很晚了,但我刚刚遇到过它。我不确定三态的东西,但我刚遇到你的FFGEN错误。合成器使用它可用的部件将代码编译为门控列表。当您在vhdl中指定库无法实现该行为的行为时(在我的情况下,具有异步复位的翻转翻转(FF)),合成器不知道当它通过时使用哪种部件和(GEN)erating部分,因此错误FFGEN。然而,合成器将为该寄存器放置一个占位符,用于描述该元素的输入输出和时钟信号(如果查看网表,可以看到它。我看起来像这样。

  \ ** FFGEN ** \ inst_clk_divider / cnt_reg [1](。next_state(n299),。clocked_on(clk), .force_00(1'b0), .force_01(RST), .force_10(1'b0), .force_11(1'b0), .Q(\ inst_clk_divider / cnt [1]));