通过Xilinx软件编写的VHDL代码,通过FPGA设计寻找最大延迟

时间:2014-12-02 15:30:12

标签: vhdl xilinx-ise

我正在研究AES代码,我的目标是创建一个能够提供最快性能的架构。因此,我需要确定从输入时间到获得最终输出的延迟。该设计将在fpga上实施。我需要通过xilinx仿真和设计总结找到延迟。但我不理解各种报道。

对于模型一,我将从设计摘要中提供3份报告。

  1. 综合报告
  2. 地点和路线报告
  3. 静态时间报告
  4. 静态时间报告

    --------------------------------------------------------------------------------
    Release 9.2i Trace 
    Copyright (c) 1995-2007 Xilinx, Inc.  All rights reserved.
    
    C:\Xilinx92i\bin\nt\trce.exe -ise C:/Xilinx92i/sbox/sbox.ise -intstyle ise -e 3
    -s 5 -xml dynamic5stage dynamic5stage.ncd -o dynamic5stage.twr
    dynamic5stage.pcf
    
    Design file:              dynamic5stage.ncd
    Physical constraint file: dynamic5stage.pcf
    Device,package,speed:     xc3s200,pq208,-5 (PRODUCTION 1.39 2007-04-13)
    Report level:             error report
    
    Environment Variable      Effect 
    --------------------      ------ 
    NONE                      No environment variables were set
    --------------------------------------------------------------------------------
    
    INFO:Timing:2698 - No timing constraints found, doing default enumeration.
    INFO:Timing:2752 - To get complete path coverage, use the unconstrained paths 
       option. All paths that are not constrained will be reported in the 
       unconstrained paths section(s) of the report.
    INFO:Timing:3339 - The clock-to-out numbers in this timing report are based on 
       a 50 Ohm transmission line loading model.  For the details of this model, 
       and for more information on accounting for different loading conditions, 
       please see the device datasheet.
    
    
    
    Data Sheet report:
    -----------------
    All values displayed in nanoseconds (ns)
    
    Setup/Hold to clock SYS_CLK
    ------------+------------+------------+------------------+--------+
                |  Setup to  |  Hold to   |                  | Clock  |
    Source      | clk (edge) | clk (edge) |Internal Clock(s) | Phase  |
    ------------+------------+------------+------------------+--------+
    BYTE_IN<0>  |    2.659(R)|    0.515(R)|SYS_CLK_BUFGP     |   0.000|
    BYTE_IN<1>  |    3.216(R)|    0.381(R)|SYS_CLK_BUFGP     |   0.000|
    BYTE_IN<2>  |    3.373(R)|    0.453(R)|SYS_CLK_BUFGP     |   0.000|
    BYTE_IN<3>  |    3.155(R)|    0.001(R)|SYS_CLK_BUFGP     |   0.000|
    BYTE_IN<4>  |    3.419(R)|    0.663(R)|SYS_CLK_BUFGP     |   0.000|
    BYTE_IN<5>  |    4.055(R)|    0.118(R)|SYS_CLK_BUFGP     |   0.000|
    BYTE_IN<6>  |    3.389(R)|    0.545(R)|SYS_CLK_BUFGP     |   0.000|
    BYTE_IN<7>  |    3.151(R)|    0.389(R)|SYS_CLK_BUFGP     |   0.000|
    RST         |    2.750(R)|    0.970(R)|SYS_CLK_BUFGP     |   0.000|
    s           |    3.140(R)|    0.344(R)|SYS_CLK_BUFGP     |   0.000|
    ------------+------------+------------+------------------+--------+
    
    Clock SYS_CLK to Pad
    ---------------+------------+------------------+--------+
                   | clk (edge) |                  | Clock  |
    Destination    |   to PAD   |Internal Clock(s) | Phase  |
    ---------------+------------+------------------+--------+
    SUB_BYTE_OUT<0>|    6.404(R)|SYS_CLK_BUFGP     |   0.000|
    SUB_BYTE_OUT<1>|    6.404(R)|SYS_CLK_BUFGP     |   0.000|
    SUB_BYTE_OUT<2>|    6.404(R)|SYS_CLK_BUFGP     |   0.000|
    SUB_BYTE_OUT<3>|    6.404(R)|SYS_CLK_BUFGP     |   0.000|
    SUB_BYTE_OUT<4>|    6.404(R)|SYS_CLK_BUFGP     |   0.000|
    SUB_BYTE_OUT<5>|    6.404(R)|SYS_CLK_BUFGP     |   0.000|
    SUB_BYTE_OUT<6>|    6.404(R)|SYS_CLK_BUFGP     |   0.000|
    SUB_BYTE_OUT<7>|    6.403(R)|SYS_CLK_BUFGP     |   0.000|
    ---------------+------------+------------------+--------+
    
    Clock to Setup on destination clock SYS_CLK
    ---------------+---------+---------+---------+---------+
                   | Src:Rise| Src:Fall| Src:Rise| Src:Fall|
    Source Clock   |Dest:Rise|Dest:Rise|Dest:Fall|Dest:Fall|
    ---------------+---------+---------+---------+---------+
    SYS_CLK        |    3.612|         |         |         |
    ---------------+---------+---------+---------+---------+
    
    
    Analysis completed Sat Nov 29 11:39:23 2014 
    --------------------------------------------------------------------------------
    
    Trace Settings:
    -------------------------
    Trace Settings 
    
    Peak Memory Usage: 93 MB
    

    地方&amp;路线报告

    Release 9.2i par J.36
    Copyright (c) 1995-2007 Xilinx, Inc.  All rights reserved.
    
    ACER-PC::  Sat Nov 29 11:38:52 2014
    
    par -w -intstyle ise -ol std -t 1 dynamic5stage_map.ncd dynamic5stage.ncd
    dynamic5stage.pcf 
    
    
    Constraints file: dynamic5stage.pcf.
    Loading device for application Rf_Device from file '3s200.nph' in environment C:\Xilinx92i.
       "dynamic5stage" is an NCD, version 3.1, device xc3s200, package pq208, speed -5
    
    Initializing temperature to 85.000 Celsius. (default - Range: 0.000 to 85.000 Celsius)
    Initializing voltage to 1.140 Volts. (default - Range: 1.140 to 1.260 Volts)
    
    INFO:Par:282 - No user timing constraints were detected or you have set the option to ignore timing constraints ("par
       -x"). Place and Route will run in "Performance Evaluation Mode" to automatically improve the performance of all
       internal clocks in this design. The PAR timing summary will list the performance achieved for each clock. Note: For
       the fastest runtime, set the effort level to "std".  For best performance, set the effort level to "high". For a
       balance between the fastest runtime and best performance, set the effort level to "med".
    
    Device speed data version:  "PRODUCTION 1.39 2007-04-13".
    
    
    Device Utilization Summary:
    
       Number of BUFGMUXs                        1 out of 8      12%
       Number of External IOBs                  19 out of 141    13%
          Number of LOCed IOBs                   0 out of 19      0%
    
       Number of Slices                         62 out of 1920    3%
          Number of SLICEMs                      0 out of 960     0%
    
    
    
    Overall effort level (-ol):   Standard 
    Placer effort level (-pl):    High 
    Placer cost table entry (-t): 1
    Router effort level (-rl):    Standard 
    
    
    
    REAL time consumed by placer: 16 secs 
    CPU  time consumed by placer: 10 secs 
    Writing design to file dynamic5stage.ncd
    
    
    Total REAL time to Placer completion: 17 secs 
    Total CPU time to Placer completion: 11 secs 
    
    Starting Router
    
    Phase 1: 482 unrouted;       REAL time: 18 secs 
    
    Phase 2: 436 unrouted;       REAL time: 18 secs 
    
    Phase 3: 178 unrouted;       REAL time: 18 secs 
    
    Phase 4: 178 unrouted; (0)      REAL time: 18 secs 
    
    Phase 5: 180 unrouted; (0)      REAL time: 18 secs 
    
    Phase 6: 0 unrouted; (87)      REAL time: 19 secs 
    
    Phase 7: 0 unrouted; (87)      REAL time: 19 secs 
    
    Updating file: dynamic5stage.ncd with current fully routed design.
    
    Phase 8: 0 unrouted; (0)      REAL time: 20 secs 
    
    Phase 9: 0 unrouted; (0)      REAL time: 20 secs 
    
    
    Total REAL time to Router completion: 20 secs 
    Total CPU time to Router completion: 13 secs 
    
    Partition Implementation Status
    -------------------------------
    
      No Partitions were found in this design.
    
    -------------------------------
    
    Generating "PAR" statistics.
    
    **************************
    Generating Clock Report
    **************************
    
    +---------------------+--------------+------+------+------------+-------------+
    |        Clock Net    |   Resource   |Locked|Fanout|Net Skew(ns)|Max Delay(ns)|
    +---------------------+--------------+------+------+------------+-------------+
    |       SYS_CLK_BUFGP |      BUFGMUX6| No   |   45 |  0.036     |  0.916      |
    +---------------------+--------------+------+------+------------+-------------+
    
    * Net Skew is the difference between the minimum and maximum routing
    only delays for the net. Note this is different from Clock Skew which
    is reported in TRCE timing report. Clock Skew is the difference between
    the minimum and maximum path delays which includes logic delays.
    
    
       The Delay Summary Report
    
    
    The NUMBER OF SIGNALS NOT COMPLETELY ROUTED for this design is: 0
    
       The AVERAGE CONNECTION DELAY for this design is:        0.832
       The MAXIMUM PIN DELAY IS:                               2.272
       The AVERAGE CONNECTION DELAY on the 10 WORST NETS is:   1.786
    
       Listing Pin Delays by value: (nsec)
    
        d < 1.00   < d < 2.00  < d < 3.00  < d < 4.00  < d < 5.00  d >= 5.00
       ---------   ---------   ---------   ---------   ---------   ---------
             337         142           2           0           0           0
    
    Timing Score: 0
    
    Asterisk (*) preceding a constraint indicates it was not met.
       This may be due to a setup or hold violation.
    
    ------------------------------------------------------------------------------------------------------
      Constraint                                |  Check  | Worst Case |  Best Case | Timing |   Timing   
                                                |         |    Slack   | Achievable | Errors |    Score   
    ------------------------------------------------------------------------------------------------------
      Autotimespec constraint for clock net SYS | SETUP   |         N/A|     3.612ns|     N/A|           0
      _CLK_BUFGP                                | HOLD    |     0.702ns|            |       0|           0
    ------------------------------------------------------------------------------------------------------
    
    
    All constraints were met.
    INFO:Timing:2761 - N/A entries in the Constraints list may indicate that the 
       constraint does not cover any paths or that it has no requested value.
    
    
    Generating Pad Report.
    
    All signals are completely routed.
    
    Total REAL time to PAR completion: 21 secs 
    Total CPU time to PAR completion: 15 secs 
    
    Peak Memory Usage:  136 MB
    
    Placement: Completed - No errors found.
    Routing: Completed - No errors found.
    
    Number of error messages: 0
    Number of warning messages: 0
    Number of info messages: 1
    
    Writing design to file dynamic5stage.ncd
    
    
    
    PAR done!
    

    综合报告

    Release 9.2i - xst J.36
    Copyright (c) 1995-2007 Xilinx, Inc.  All rights reserved.
    --> Parameter TMPDIR set to ./xst/projnav.tmp
    CPU : 0.00 / 4.04 s | Elapsed : 0.00 / 4.00 s
    
    --> Parameter xsthdpdir set to ./xst
    CPU : 0.00 / 4.04 s | Elapsed : 0.00 / 4.00 s
    
    --> Reading design: dynamic5stage.prj
    
    
    
    =========================================================================
    *                      Synthesis Options Summary                        *
    =========================================================================
    ---- Source Parameters
    Input File Name                    : "dynamic5stage.prj"
    Input Format                       : mixed
    Ignore Synthesis Constraint File   : NO
    
    ---- Target Parameters
    Output File Name                   : "dynamic5stage"
    Output Format                      : NGC
    Target Device                      : xc3s200-5-pq208
    
    ---- Source Options
    Top Module Name                    : dynamic5stage
    Automatic FSM Extraction           : YES
    FSM Encoding Algorithm             : Auto
    Safe Implementation                : No
    FSM Style                          : lut
    RAM Extraction                     : Yes
    RAM Style                          : Auto
    ROM Extraction                     : Yes
    Mux Style                          : Auto
    Decoder Extraction                 : YES
    Priority Encoder Extraction        : YES
    Shift Register Extraction          : YES
    Logical Shifter Extraction         : YES
    XOR Collapsing                     : YES
    ROM Style                          : Auto
    Mux Extraction                     : YES
    Resource Sharing                   : YES
    Asynchronous To Synchronous        : NO
    Multiplier Style                   : auto
    Automatic Register Balancing       : No
    
    ---- Target Options
    Add IO Buffers                     : YES
    Global Maximum Fanout              : 500
    Add Generic Clock Buffer(BUFG)     : 8
    Register Duplication               : YES
    Slice Packing                      : YES
    Optimize Instantiated Primitives   : NO
    Use Clock Enable                   : Yes
    Use Synchronous Set                : Yes
    Use Synchronous Reset              : Yes
    Pack IO Registers into IOBs        : auto
    Equivalent register Removal        : YES
    
    ---- General Options
    Optimization Goal                  : Speed
    Optimization Effort                : 1
    Library Search Order               : dynamic5stage.lso
    Keep Hierarchy                     : NO
    RTL Output                         : Yes
    Global Optimization                : AllClockNets
    Read Cores                         : YES
    Write Timing Constraints           : NO
    Cross Clock Analysis               : NO
    Hierarchy Separator                : /
    Bus Delimiter                      : <>
    Case Specifier                     : maintain
    Slice Utilization Ratio            : 100
    BRAM Utilization Ratio             : 100
    Verilog 2001                       : YES
    Auto BRAM Packing                  : NO
    Slice Utilization Ratio Delta      : 5
    
    =========================================================================
    
    
    =========================================================================
    *                          HDL Compilation                              *
    =========================================================================
    Compiling vhdl file "C:/Xilinx92i/sbox/dynamic5stage.vhd" in Library work.
    Entity <dynamic5stage> compiled.
    Entity <dynamic5stage> (Architecture <Behavioral>) compiled.
    
    =========================================================================
    *                     Design Hierarchy Analysis                         *
    =========================================================================
    Analyzing hierarchy for entity <dynamic5stage> in library <work> (architecture <Behavioral>).
    
    
    =========================================================================
    *                            HDL Analysis                               *
    =========================================================================
    Analyzing Entity <dynamic5stage> in library <work> (Architecture <Behavioral>).
    INFO:Xst:1561 - "C:/Xilinx92i/sbox/dynamic5stage.vhd" line 278: Mux is complete : default of case is discarded
    Entity <dynamic5stage> analyzed. Unit <dynamic5stage> generated.
    
    =========================================================================
    HDL Synthesis Report
    
    Macro Statistics
    # ROMs                                                 : 1
     16x4-bit ROM                                          : 1
    # Registers                                            : 13
     4-bit register                                        : 12
     8-bit register                                        : 1
    # Xors                                                 : 89
     1-bit xor2                                            : 56
     1-bit xor3                                            : 24
     1-bit xor4                                            : 1
     2-bit xor2                                            : 6
     4-bit xor2                                            : 2
    
    =========================================================================
    
    =========================================================================
    *                       Advanced HDL Synthesis                          *
    =========================================================================
    
    Loading device for application Rf_Device from file '3s200.nph' in environment C:\Xilinx92i.
    INFO:Xst:2506 - Unit <dynamic5stage> : In order to maximize performance and save block RAM resources, the small ROM <Mrom_GALOIS_MUL_INV> will be implemented on LUT. If you want to force its implementation on block, use option/constraint rom_style.
    INFO:Xst:2261 - The FF/Latch <STAGE2_1_3> in Unit <dynamic5stage> is equivalent to the following FF/Latch, which will be removed : <STAGE2_2_1> 
    
    =========================================================================
    Advanced HDL Synthesis Report
    
    Macro Statistics
    # ROMs                                                 : 1
     16x4-bit ROM                                          : 1
    # Registers                                            : 55
     Flip-Flops                                            : 55
    # Xors                                                 : 89
     1-bit xor2                                            : 56
     1-bit xor3                                            : 24
     1-bit xor4                                            : 1
     2-bit xor2                                            : 6
     4-bit xor2                                            : 2
    
    =========================================================================
    
    =========================================================================
    *                         Low Level Synthesis                           *
    =========================================================================
    
    Optimizing unit <dynamic5stage> ...
    
    Mapping all equations...
    Building and optimizing final netlist ...
    Found area constraint ratio of 100 (+ 5) on block dynamic5stage, actual ratio is 3.
    
    Final Macro Processing ...
    
    =========================================================================
    Final Register Report
    
    Macro Statistics
    # Registers                                            : 55
     Flip-Flops                                            : 55
    
    =========================================================================
    
    =========================================================================
    *                          Partition Report                             *
    =========================================================================
    
    Partition Implementation Status
    -------------------------------
    
      No Partitions were found in this design.
    
    -------------------------------
    
    =========================================================================
    *                            Final Report                               *
    =========================================================================
    Final Results
    RTL Top Level Output File Name     : dynamic5stage.ngr
    Top Level Output File Name         : dynamic5stage
    Output Format                      : NGC
    Optimization Goal                  : Speed
    Keep Hierarchy                     : NO
    
    Design Statistics
    # IOs                              : 19
    
    Cell Usage :
    # BELS                             : 114
    #      LUT2                        : 22
    #      LUT2_D                      : 4
    #      LUT2_L                      : 1
    #      LUT3                        : 14
    #      LUT3_L                      : 2
    #      LUT4                        : 49
    #      LUT4_D                      : 3
    #      LUT4_L                      : 12
    #      MUXF5                       : 7
    # FlipFlops/Latches                : 55
    #      FDR                         : 54
    #      FDRS                        : 1
    # Clock Buffers                    : 1
    #      BUFGP                       : 1
    # IO Buffers                       : 18
    #      IBUF                        : 10
    #      OBUF                        : 8
    =========================================================================
    
    Device utilization summary:
    ---------------------------
    
    Selected Device : 3s200pq208-5 
    
     Number of Slices:                      61  out of   1920     3%  
     Number of Slice Flip Flops:            55  out of   3840     1%  
     Number of 4 input LUTs:               107  out of   3840     2%  
     Number of IOs:                         19
     Number of bonded IOBs:                 19  out of    141    13%  
     Number of GCLKs:                        1  out of      8    12%  
    
    ---------------------------
    Partition Resource Summary:
    ---------------------------
    
      No Partitions were found in this design.
    
    ---------------------------
    
    
    =========================================================================
    TIMING REPORT
    
    NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE.
          FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT
          GENERATED AFTER PLACE-and-ROUTE.
    
    Clock Information:
    ------------------
    -----------------------------------+------------------------+-------+
    Clock Signal                       | Clock buffer(FF name)  | Load  |
    -----------------------------------+------------------------+-------+
    SYS_CLK                            | BUFGP                  | 55    |
    -----------------------------------+------------------------+-------+
    
    Asynchronous Control Signals Information:
    ----------------------------------------
    No asynchronous control signals found in this design
    
    Timing Summary:
    ---------------
    Speed Grade: -5
    
       Minimum period: 4.822ns (Maximum Frequency: 207.394MHz)
       Minimum input arrival time before clock: 6.639ns
       Maximum output required time after clock: 6.216ns
       Maximum combinational path delay: No path found
    
    Timing Detail:
    --------------
    All values displayed in nanoseconds (ns)
    
    =========================================================================
    Timing constraint: Default period analysis for Clock 'SYS_CLK'
      Clock period: 4.822ns (frequency: 207.394MHz)
      Total number of paths / destination ports: 242 / 43
    -------------------------------------------------------------------------
    Delay:               4.822ns (Levels of Logic = 3)
      Source:            STAGE3_3_0 (FF)
      Destination:       STAGE4_2_3 (FF)
      Source Clock:      SYS_CLK rising
      Destination Clock: SYS_CLK rising
    
      Data Path: STAGE3_3_0 to STAGE4_2_3
                                    Gate     Net
        Cell:in->out      fanout   Delay   Delay  Logical Name (Net Name)
        ----------------------------------------  ------------
         FDR:C->Q              4   0.626   1.074  STAGE3_3_0 (STAGE3_3_0)
         LUT4_D:I0->O          2   0.479   0.768  Mxor_GAL2_MUL_31_xor0000_xo<1>1 (GAL2_MUL_31_xor0000)
         LUT4:I3->O            1   0.479   0.740  Mxor_OUTPUT1_xor0000_Result<1>11 (N211)
         LUT4:I2->O            1   0.479   0.000  Mxor_OUTPUT1_xor0000_Result<1> (GALOIS_MUL_3<3>)
         FDR:D                     0.176          STAGE4_2_3
        ----------------------------------------
        Total                      4.822ns (2.239ns logic, 2.583ns route)
                                           (46.4% logic, 53.6% route)
    
    =========================================================================
    Timing constraint: Default OFFSET IN BEFORE for Clock 'SYS_CLK'
      Total number of paths / destination ports: 168 / 76
    -------------------------------------------------------------------------
    Offset:              6.639ns (Levels of Logic = 5)
      Source:            BYTE_IN<4> (PAD)
      Destination:       STAGE1_2_1 (FF)
      Destination Clock: SYS_CLK rising
    
      Data Path: BYTE_IN<4> to STAGE1_2_1
                                    Gate     Net
        Cell:in->out      fanout   Delay   Delay  Logical Name (Net Name)
        ----------------------------------------  ------------
         IBUF:I->O             7   0.715   1.201  BYTE_IN_4_IBUF (BYTE_IN_4_IBUF)
         LUT2:I0->O            2   0.479   0.804  GALOIS_ADD_1<0>31 (GALOIS_ADD_1<0>_bdd5)
         LUT4:I2->O            1   0.479   0.976  GALOIS_ADD_1<0>11 (GALOIS_ADD_1<0>_bdd0)
         LUT3:I0->O            1   0.479   0.851  GALOIS_ADD_1<1>_SW0 (N25)
         LUT4:I1->O            1   0.479   0.000  GALOIS_ADD_1<1> (GALOIS_ADD_1<1>)
         FDR:D                     0.176          STAGE1_2_1
        ----------------------------------------
        Total                      6.639ns (2.807ns logic, 3.832ns route)
                                           (42.3% logic, 57.7% route)
    
    =========================================================================
    Timing constraint: Default OFFSET OUT AFTER for Clock 'SYS_CLK'
      Total number of paths / destination ports: 8 / 8
    -------------------------------------------------------------------------
    Offset:              6.216ns (Levels of Logic = 1)
      Source:            OUTPUT_LATCH_7 (FF)
      Destination:       SUB_BYTE_OUT<7> (PAD)
      Source Clock:      SYS_CLK rising
    
      Data Path: OUTPUT_LATCH_7 to SUB_BYTE_OUT<7>
                                    Gate     Net
        Cell:in->out      fanout   Delay   Delay  Logical Name (Net Name)
        ----------------------------------------  ------------
         FDR:C->Q              1   0.626   0.681  OUTPUT_LATCH_7 (OUTPUT_LATCH_7)
         OBUF:I->O                 4.909          SUB_BYTE_OUT_7_OBUF (SUB_BYTE_OUT<7>)
        ----------------------------------------
        Total                      6.216ns (5.535ns logic, 0.681ns route)
                                           (89.0% logic, 11.0% route)
    
    =========================================================================
    CPU : 29.56 / 34.76 s | Elapsed : 29.00 / 34.00 s
    
    --> 
    
    Total memory usage is 205164 kilobytes
    
    Number of errors   :    0 (   0 filtered)
    Number of warnings :    0 (   0 filtered)
    Number of infos    :    3 (   0 filtered)
    

1 个答案:

答案 0 :(得分:0)

要测量AES块的性能,可以将地点和路径报告底部的autotimespec值3.612ns与系统中的管道阶段数相乘。您写道当前有5个管道阶段,因此通过系统的总时间将为5 * 3.612ns = 18.060ns。如果您添加另一个管道阶段,希望它能使系统更快,那么时钟必须能够以18.060ns / 6 = 3.010 ns的速度运行,以增加管道阶段,以提高性能。

该工具已计算出最小时钟周期为3.612ns = 276 MHz,但如果您将sys_clk限制为比它更快,则可能使其更快。