Q: An interviewer says: 'Your reg2reg setup path has positive slack but the design still fails on silicon at speed.' Give plausible STA-related root causes.

Positive STA slack only means the path meets timing under the models and constraints you gave it . Common reasons silicon still fails: Missing/optimistic corners : you didn't sign off the corner that's actually worst (e.g. a temperature-inversion 'hot vs cold' crossover, or an SI corner). Always close on the right MMMC set. Wrong constraints : an over-aggressive set_false_path / set_multicycle_path hid the real path, or input/output delays didn't match the true environment. Insufficient OCV/derate : on-chip variation under-modeled (no AOCV/POCV, or wrong derates), so the silicon path is slower than analysis. SI not modeled : crosstalk delta-delay from aggressors pushed the path slower than the noise-free number. Extraction/parasitic mismatch : wrong RC corner, missing coupling cap, or post-route vs estimated RC differences. Clock issues : real skew/jitter worse than the uncertainty budget, or a duty-cycle/clock-path problem. Library accuracy : NLDM vs CCS, slew clamp, or characterization not matching the real process. The lesson: STA slack is a statement about the model, not a guarantee about the die.

Question 1

What is Static Timing Analysis (STA), and why do we use it instead of gate-level (dynamic) simulation to sign off timing?

Accepted Answer

STA is a method of verifying a design's timing by computing signal arrival times along every timing path and checking them against the setup/hold constraints at each endpoint, without applying any input vectors . It is static (topology + timing models, not value-based) and exhaustive (all paths checked). Why over gate-level sim: Coverage : simulation only exercises the paths your stimulus toggles; STA checks every path regardless of functional activity. Speed : no need to develop/run vectors; analysis is roughly linear in design size, so it scales to multi-million-gate blocks. Worst-case guarantee : STA finds the worst path in each timing group; sim would need an impractically large vector set to hit the same worst case. The trade-off: STA only checks timing, not function, and it can flag paths that are never functionally exercised (false paths) unless you tell it about them.

Question 2

Define a timing path. What are its startpoint and endpoint, and what kinds of points can each be?

Accepted Answer

A timing path is a point-to-point route through the design along which a single launch-to-capture timing check is made. It runs from a startpoint to an endpoint . Startpoint (where data is launched): either an input port of the design, or the clock pin of a sequential element (FF/latch). Endpoint (where data is captured/checked): either the data (D) pin of a sequential element , or an output port of the design. Each path consists of alternating cell arcs (through gates) and net arcs (interconnect), and the tool propagates a delay along the whole chain. Note a path is always single-clock-edge to single-clock-edge; combinational feedback or async crossings are handled specially.

Question 3

What is a timing arc? Distinguish cell arcs from net arcs, and name the main types of cell arc.

Accepted Answer

A timing arc is a directed delay (and constraint) relationship between two pins that STA uses to propagate timing. Cell arc (internal to a standard cell, from the liberty .lib model): Delay arcs (combinational): input pin to output pin, e.g. A to Y of a NAND. Characterized as a function of input transition (slew) and output load (capacitance) , usually via NLDM/CCS tables. Can be positive/negative unate or non-unate . Sequential arcs : CLK to Q (clock-to-output, T_cq). Constraint arcs (checks) : setup and hold on D relative to CLK, plus recovery/removal for async pins, and min-pulse-width. Net arc (interconnect): delay from a driver pin to each receiver pin, derived from extracted RC (Elmore or higher-order). Net arcs model wire delay; cell arcs model gate delay. STA alternates cell and net arcs along a path.

Question 4

STA splits checks into the four classic path groups: in2reg, reg2reg, reg2out, in2out. Define each, and say which require I/O constraints.

Accepted Answer

Path groups classify paths by what the startpoint and endpoint are:

reg2reg (register-to-register): start = launch FF clock pin, end = capture FF/D. The internal core paths; bounded entirely by the clock period and constrained automatically.
in2reg (input-to-register): start = input port, end = FF/D. Needs set_input_delay to model how late data arrives at the port relative to the clock.
reg2out (register-to-output): start = FF clock pin, end = output port. Needs set_output_delay to model the external setup/hold the downstream block requires.
in2out (input-to-output): start = input port, end = output port; a purely combinational feedthrough. Needs both set_input_delay and set_output_delay.

So all three I/O groups need (virtual-)clock-referenced I/O constraints; only reg2reg is fully constrained by the clock definition alone. These groups let the tool report and budget each interface separately.

Question 5

Explain arrival time and required time, and how slack is computed for a setup check and for a hold check. Be precise about signs.

Accepted Answer

Arrival time (AT) = the actual time the data signal reaches a pin, found by summing launch clock edge + clk-to-Q + combinational/net delays along the path. Required time (RT) = the latest (setup) or earliest (hold) time data is allowed to arrive at the endpoint to meet the constraint, derived from the capture clock edge and the setup/hold requirement. Slack = RT − AT for setup (data must arrive before required), so: Setup slack = RT_setup − AT where RT_setup = T_capture_edge + T_capture_clk_latency − T_setup − T_uncertainty . Positive = met. Hold slack = AT − RT_hold where RT_hold = T_capture_edge + T_capture_clk_latency + T_hold + T_uncertainty_hold . Data must arrive after the hold window, so here it's AT minus RT. Positive = met. (Here AT already includes the launch-clock latency, so the capture-clock latency in RT captures the skew.) Key sign intuition: setup wants data fast enough (large AT hurts), hold wants data slow enough (small AT hurts). Setup is a max-delay (late) check; hold is a min-delay (early) check.

Question 6

Why is STA called 'exhaustive' and 'vectorless'? What does that buy you and what does it cost you?

Accepted Answer

Vectorless : STA does not need input stimulus. It works purely on the netlist topology, the timing arcs, and the constraints. It assigns worst-case transitions to every arc rather than evaluating logic values. Exhaustive : because it is value-independent, STA traverses and checks every timing path in the design (within each mode/corner), guaranteeing it finds the true critical path in each group. What it buys: Complete timing coverage with no reliance on testbench quality. Fast, deterministic, repeatable sign-off that scales to huge designs. What it costs: It is function-blind : it will time paths that can never be sensitized at the same time (false paths) or that legitimately take multiple cycles (multicycle), so they must be declared with set_false_path / set_multicycle_path or you get pessimism. It cannot catch functional/logical bugs, glitches, or value-dependent behavior; that still needs simulation.

Question 7

What are the key limitations of STA, and what classes of problems can it NOT find?

Accepted Answer

STA is timing-only and assumption-bound. Its main limitations:

No functional verification: it checks timing, not logical correctness; you still need simulation/formal for that.
False/multicycle paths: being vectorless, it reports paths that are functionally impossible or legitimately multi-cycle unless you constrain them, causing pessimism or wasted effort.
Asynchronous / clock-domain crossings: STA can't validate metastability or synchronizer correctness; CDC needs dedicated tools.
Combinational loops: must be broken (loop-cut) for analysis.
Glitch / dynamic effects: STA assumes a single clean transition per arc; it doesn't model glitches, simultaneous switching noise beyond derate, or detailed signal integrity unless explicitly modeled (CRPR, SI delay/noise add-ons).
Model accuracy bound: results are only as good as the .lib characterization, RC extraction, constraints (SDC), and the chosen corners/derates. Bad constraints = confidently wrong sign-off.
Doesn't pick vectors: can't tell you which input pattern actually exercises the critical path.

Question 8

Write the basic single-cycle setup and hold inequalities for a reg2reg path, including clock skew and uncertainty, and explain why hold is independent of clock period.

Accepted Answer

Let launch and capture be on the same edge for hold, and one period apart for setup. With T_skew = T_capture_clk_arrival − T_launch_clk_arrival:
Setup (max-delay, data must settle before next capture edge):
T_cq + T_comb_max + T_setup ≤ T_clk + T_skew − T_uncertainty
Hold (min-delay, data must not race through within the same edge):
T_cq + T_comb_min ≥ T_hold + T_skew + T_uncertainty
Notice hold has no T_clk term: both launch and capture are referenced to the same clock edge, so the period cancels out. That's why a hold violation cannot be fixed by slowing the clock; you fix it by adding delay (buffers) on the data path or fixing skew. Setup, in contrast, scales with the period, so a slower clock relaxes setup. Also note positive skew (capture later) helps setup but hurts hold, and vice versa.

Question 9

An interviewer says: 'Your reg2reg setup path has positive slack but the design still fails on silicon at speed.' Give plausible STA-related root causes.

Accepted Answer

Positive STA slack only means the path meets timing under the models and constraints you gave it. Common reasons silicon still fails:

Missing/optimistic corners: you didn't sign off the corner that's actually worst (e.g. a temperature-inversion 'hot vs cold' crossover, or an SI corner). Always close on the right MMMC set.
Wrong constraints: an over-aggressive set_false_path/set_multicycle_path hid the real path, or input/output delays didn't match the true environment.
Insufficient OCV/derate: on-chip variation under-modeled (no AOCV/POCV, or wrong derates), so the silicon path is slower than analysis.
SI not modeled: crosstalk delta-delay from aggressors pushed the path slower than the noise-free number.
Extraction/parasitic mismatch: wrong RC corner, missing coupling cap, or post-route vs estimated RC differences.
Clock issues: real skew/jitter worse than the uncertainty budget, or a duty-cycle/clock-path problem.
Library accuracy: NLDM vs CCS, slew clamp, or characterization not matching the real process.

The lesson: STA slack is a statement about the model, not a guarantee about the die.

Question 10

Define unateness for a timing arc (positive unate, negative unate, non-unate). Why does it matter for STA?

Accepted Answer

Unateness describes how an output transition direction relates to an input transition direction for a delay arc: Positive unate : a rising input causes a rising output and a falling input a falling output (e.g. the arc through a buffer, AND, or OR). Rise tracks rise. Negative unate : a rising input causes a falling output and vice versa (e.g. INV, NAND, NOR). Rise inverts to fall. Non-unate : output direction is not determined by that input's direction alone, so both senses are possible (e.g. XOR/XNOR, and the clock pin of a flop). Why it matters: It tells STA which input edge to associate with which output edge when propagating rise vs fall arrival times and slews , which directly affects delay selection (rise and fall delays differ). For clock paths , unateness determines whether STA treats a stage as inverting; an odd number of inverting (negative-unate) stages flips the effective edge (a positive edge at the source becomes a negative edge at the FF clock pin), which changes which edges launch/capture. Non-unate clock-network elements force the tool to consider both edge senses, increasing analysis and sometimes pessimism.

Question 11

What is the difference between path-based and graph-based (block-based) analysis in STA, and when is each used?

Accepted Answer

Graph-based analysis (GBA) : the tool propagates a single worst-case arrival (and worst slew) per pin through the timing graph, taking the max (setup) or min (hold) at each merge point. It is fast and pessimistic because the worst slew at a node may not belong to the same path that produced the worst arrival; that 'slew merging' pessimism is carried forward. Path-based analysis (PBA) : the tool re-times specific critical paths end-to-end using the actual slew propagated along that path, removing the cross-path pessimism GBA introduced. It is more accurate but slower , so it's used selectively. Usage in practice: Run GBA for full-chip iteration and optimization, accepting conservative numbers. Run PBA (exhaustive or path-recovery mode) on the failing/near-critical endpoints during signoff to recover pessimism and avoid over-fixing paths that actually pass. PBA can recover meaningful slack (often tens of ps) on deep paths, which can be the difference between closing timing and adding needless buffers.

Question 12

On a setup path, the capture clock arrives later than the launch clock (positive skew). Does that help or hurt setup, and what about hold? State it generally.

Accepted Answer

Define T_skew = T_capture_arrival − T_launch_arrival (positive = capture clock is later). Setup : positive skew helps . The capture edge effectively moves later, giving the data more time: T_cq + T_comb + T_setup ≤ T_clk + T_skew − T_uncertainty — a larger T_skew relaxes the bound. Hold : positive skew hurts . The same-edge hold check tightens: T_cq + T_comb_min ≥ T_hold + T_skew + T_uncertainty — a larger T_skew makes the requirement harder. General statement: skew toward the capture flop (positive) borrows time for setup but eats into hold margin; skew toward the launch flop (negative) does the opposite. This is exactly why useful-skew optimization trades setup vs hold across stages, and why aggressive setup-driven skewing can create hold violations downstream. Uncertainty (jitter) always subtracts from setup margin and adds to the hold requirement, regardless of skew sign.

Question 13

What are the four canonical timing path groups in STA? Define each by its start point and end point.

Accepted Answer

STA always analyzes a path from a defined start point to a defined end point. Start points are input ports or clock pins of sequential cells; end points are output ports or data (D) pins of sequential cells. The four classic groups are:

reg-to-reg (FF to FF): launch flop clock pin to capture flop D pin. The dominant, most common group; bounded by both setup and hold.
in-to-reg (input to FF): primary input port to a capture flop D pin. Constrained by set_input_delay.
reg-to-out (FF to output): launch flop clock pin to primary output port. Constrained by set_output_delay.
in-to-out (input to output): primary input through pure combinational logic to a primary output. A feedthrough/combinational path constrained by both input and output delay.

Latch-based designs add latch-to-latch paths where time borrowing applies.

Question 14

Differentiate the data path from the clock path in a reg-to-reg timing arc. Why is the clock path traversed twice in one setup check?

Accepted Answer

Data path: the logic the launched data travels through — from the launch flop's Q pin, through combinational logic, to the capture flop's D pin. Its delay is what we try to fit inside the clock period. Clock path (clock network): from the clock source/port through the clock tree (buffers, gates) to the CK pin of each flop. It determines when each flop actually sees its edge. In a single setup check the clock network is traversed twice , computing two separate insertion delays: the launch clock path to the launch flop's CK pin, and the capture clock path to the capture flop's CK pin. The difference between these two is the clock skew . Treating them independently is essential — common on-chip variation (OCV/AOCV) and CPPR/CRPR correct for the shared (common) portion of the two clock paths so the skew is not double-penalized.

Question 15

Write the fundamental setup (max-delay) inequality for a reg-to-reg path and define every term.

Accepted Answer

Setup requires that data arrives at the capture D pin before the setup window of the capturing edge:
T_launch + T_cq + T_comb ≤ T_period + T_capture − T_setup − T_uncertainty
Rearranged into the textbook form (clean clocks, skew = capture − launch insertion):
T_cq + T_comb + T_setup ≤ T_clk + T_skew − T_uncertainty

T_cq: clock-to-Q delay of the launch flop.
T_comb: combinational data-path delay (the data path).
T_setup: setup time of the capture flop (data must be stable before the edge).
T_clk: clock period.
T_skew: capture insertion delay − launch insertion delay. Positive skew helps setup (capture edge arrives later).
T_uncertainty: jitter + margin (pre-CTS also models skew). It is subtracted for setup.

Setup uses late data arrival vs early required time; data-path cells use the max/slow library corner.

Question 16

Write the hold (min-delay) inequality and explain why hold violations cannot be fixed by slowing the clock.

Accepted Answer

Hold requires that newly launched data does NOT arrive at the capture D pin too soon — it must stay stable for the hold window after the same capturing edge that latched the previous data:
T_cq + T_comb ≥ T_hold + T_skew + T_uncertainty
i.e. the fastest data path must be slower than the hold requirement at the capture flop. Equivalently T_launch + T_cq + T_comb ≥ T_capture + T_hold for the same clock edge (no period term).

Because hold is a same-edge check, the clock period T_clk cancels out entirely. Slowing the clock does nothing for hold.
Hold uses the min/fast data corner (shortest delay) and is the worst case when the capture clock arrives late relative to launch (positive skew hurts hold).
Fix hold by adding delay in the data path (buffers/delay cells) or rebalancing clock skew.

Question 17

Define launch and capture flip-flops. For a setup check on a single-cycle path, which clock edges launch and capture, and what changes for hold?

Accepted Answer

Launch flop: the source register whose CK edge sends data into the data path (data leaves on its Q after T_cq ). Capture flop: the destination register whose CK edge samples the data at its D pin (governed by setup/hold). Setup, single-cycle: data is launched by edge at time 0 (launch edge) and must be captured by the next edge at time T_clk (capture edge). The available time is one full period. Hold: uses the same edge for launch and capture — the data launched at edge n must not corrupt the data being captured at edge n. Available time is essentially zero, hence the min-delay requirement. For multicycle paths the setup capture edge is moved out by N periods ( set_multicycle_path N ), and the hold edge is conventionally moved with it (default hold = setup_mcm − 1) unless explicitly re-specified.

Question 18

Walk through computing the data arrival time and the data required time for a setup check, then define slack.

Accepted Answer

Data arrival time at the capture D pin (using late/max numbers): arrival = T_launch_clk + T_cq + T_comb where T_launch_clk is the launch-clock insertion delay to the launch CK pin. Data required time at the same D pin for setup: required = T_clk + T_capture_clk − T_setup − T_uncertainty where T_capture_clk is the capture-clock insertion delay. Setup slack: slack = required − arrival Positive slack = timing met; negative = violation. For hold , the sense flips: slack = arrival − required , using early/min data and the same-edge required time required = T_capture_clk + T_hold + T_uncertainty . The reported critical path is simply the path with the most negative (worst) slack in its group.

Question 19

What exactly is the 'critical path'? Is the longest combinational path always the critical path? Justify.

Accepted Answer

The critical path is the timing path with the worst (most negative, or smallest positive) slack in the design — it limits the maximum operating frequency for setup analysis. It is defined by slack, not by raw delay. No , the longest combinational path is not necessarily critical. Slack depends on the full equation: clock period, both clock insertion delays (skew), setup time, uncertainty, and the data delay. A shorter data path on a faster (tighter-period) clock domain, or one with negative skew (capture clock arriving early), or one ending at a flop with a large setup time, can have worse slack than a physically longer path. Different path groups / clock domains can have different periods, so 'longest delay' across groups is meaningless without normalizing by slack. Designers also track WNS (worst negative slack — the single critical path) and TNS (total negative slack — sum over all violating endpoints) to gauge overall closure effort.

Question 20

Why does positive clock skew help setup but hurt hold? Give the sign convention and the intuition.

Accepted Answer

Define skew = capture insertion delay − launch insertion delay . Positive skew means the capture clock edge arrives later than the launch edge. Setup: a later capture edge gives the data path more time to propagate. In T_cq + T_comb + T_setup ≤ T_clk + T_skew , positive skew adds to the RHS budget. So positive skew helps setup . Hold: a later capture edge means the capture flop samples later, giving the next launched data more chance to race through and corrupt the value. In T_cq + T_comb ≥ T_hold + T_skew , positive skew increases the RHS requirement. So positive skew hurts hold . This is the classic skew tradeoff: deliberate useful skew can buy setup margin on a critical path, but it must be paid back as hold margin (extra data-path delay) on the same and neighboring paths.

Question 21

Given T_cq = 0.15 ns, T_comb = 0.80 ns, T_setup = 0.10 ns, T_clk = 1.00 ns, capture insertion = 0.30 ns, launch insertion = 0.25 ns, clock uncertainty = 0.05 ns. Compute the setup slack. Then state the max frequency if skew and uncertainty were zero.

Accepted Answer

Skew = capture − launch insertion = 0.30 − 0.25 = +0.05 ns . Data arrival (relative, using launch insertion) = launch_ins + T_cq + T_comb = 0.25 + 0.15 + 0.80 = 1.20 ns . Data required = T_clk + capture_ins − T_setup − uncertainty = 1.00 + 0.30 − 0.10 − 0.05 = 1.15 ns . Setup slack = required − arrival = 1.15 − 1.20 = −0.05 ns (VIOLATION). Sanity check with the compact form: slack = T_clk + skew − uncertainty − (T_cq + T_comb + T_setup) = 1.00 + 0.05 − 0.05 − 1.05 = −0.05 ns . Consistent. With skew = 0 and uncertainty = 0, the minimum period is T_cq + T_comb + T_setup = 0.15 + 0.80 + 0.10 = 1.05 ns , so f_max = 1 / 1.05 ns &asymp; 952 MHz .

Question 22

On a pure input-to-output (combinational feedthrough) path with no flops, how is timing constrained, and what is the arrival/required computation?

Accepted Answer

There is no launch or capture flop, so the path is bounded by the I/O delay budget defined relative to a virtual (or real) clock:

Start: set_input_delay models the external delay from the upstream (off-chip/block) launch flop to the input port.
End: set_output_delay models the external setup requirement of the downstream capture flop seen at the output port.

Setup at the output port:
arrival = T_input_delay + T_comb
required = T_clk − T_output_delay (− uncertainty)
slack = required − arrival
So the on-chip combinational budget is effectively T_clk − T_input_delay − T_output_delay. If no I/O delays are set, the port is unconstrained and STA may report it as no path / unconstrained endpoint — a common interview gotcha. A min-delay (hold) check uses the early input delay and min output delay analogously.

Question 23

Draw and explain the full reg-to-reg path schematic and annotate where clock-to-Q, combinational delay, and setup come from physically.

Accepted Answer

Data flows: launch flop CK → (after T_cq ) launch Q → combinational cloud ( T_comb ) → capture flop D, sampled by the capture CK edge subject to T_setup . T_cq (clock-to-Q): the launch flop's internal propagation from its CK edge to a valid Q — a property of the launch flop and its output load/slew. T_comb: the sum of cell + net delays through the data-path logic (the data path), evaluated at the slow corner for setup, fast corner for hold. T_setup: a constraint of the capture flop — how long D must be stable before the capture CK edge. The two CK pins are driven by separate branches of the clock tree, so their insertion delays differ — that difference is the skew applied to the check. The setup budget is T_clk + T_skew − uncertainty ≥ T_cq + T_comb + T_setup .

Question 24

Senior-level: in latch-based timing, how do time borrowing and the concept of paths-through-latches change the basic setup equation, and what is the risk?

Accepted Answer

Latches are level-sensitive, so data can pass through while the clock is in its transparent phase. This enables time borrowing (a.k.a. cycle stealing): if data arrives after the latch's opening edge but while the latch is still transparent, it propagates through immediately, effectively letting a slow stage borrow time from the next stage.

The setup check moves from 'arrive before the opening edge' to 'arrive before the latch closes', adding the transparency window to the budget: borrow up to (pulse width − T_setup).
Borrowing chains across multiple latch stages, so STA must analyze paths through latches, not just simple reg-to-reg arcs, and the borrowed amount propagates as an adjusted required time downstream.

Risks: borrowing can cascade and mask a real frequency problem; excessive borrow at one latch tightens the next stage and can worsen hold (data is live during the transparent phase, raising race risk). Tools cap borrow at the available window and flag when the borrow demand exceeds it.

Question 25

Define setup time and hold time for a flip-flop. What physically goes wrong if each is violated?

Accepted Answer

Setup time (T_setup): the minimum interval before the active clock edge during which the data input D must be stable. Hold time (T_hold): the minimum interval after the active clock edge during which D must remain stable. Together they define a forbidden window around the edge in which data may not transition. Setup violation: data arrives too late, so the new value misses the capturing edge. The flop may capture the old value, the wrong value, or go metastable. A setup failure means the path is too slow for the clock period. Hold violation: data changes too soon after the edge (it races through combinational logic and corrupts the value being captured). The flop captures a value that should have been latched on the next cycle. A hold failure is a logic-correctness problem independent of how slow you run the clock.

Question 26

Write the setup timing equation for a single-cycle flop-to-flop path and explain every term, including the sign of skew.

Accepted Answer

Setup constraint: T_clk ≥ T_cq + T_comb + T_setup − T_skew Equivalently, data arrival must beat data required time: T_launch + T_cq + T_comb ≤ T_capture + T_clk + T_skew − T_setup . T_clk &mdash; clock period (the time budget for one cycle). T_cq &mdash; clock-to-Q of the launch flop (propagation delay from launch edge to data out). T_comb &mdash; combinational delay through the data path between the two flops. T_setup &mdash; setup requirement of the capture flop; it eats into the budget, hence it is subtracted. T_skew = T_capture_clk − T_launch_clk &mdash; if the capture clock arrives later than the launch clock, skew is positive and it helps setup (more time), so it is subtracted on the right / added to the budget. In practice the right side also subtracts clock uncertainty (jitter + margin) and applies OCV/derate, but the core relation is the one above.

Question 27

Write the hold timing equation and explain why hold checks are independent of clock frequency.

Accepted Answer

Hold constraint: T_cq + T_comb ≥ T_hold + T_skew Read as: the fastest data arrival at the capture flop must still be later than its hold requirement. The new data launched by an edge must not arrive at the capture flop so fast that it violates the hold window of that same edge. Both sides reference the same active clock edge &mdash; the launch and the capture-flop hold check happen on edge N, not edge N and edge N+1. The clock period T_clk never appears. Because T_clk cancels out, hold is purely a relationship between data-path delay, hold requirement, and skew &mdash; all of which are fixed by the silicon and layout, not by how fast you clock. Consequence: you cannot fix a hold violation by changing frequency. Slowing the clock does nothing; speeding it up does nothing. Hold must be fixed structurally (buffers/delay on the data path or skew adjustment). Here T_skew is the same definition (capture − launch); positive skew (capture later) hurts hold because it gives the racing data more time to corrupt the capture.

Question 28

A path has T_cq = 120 ps, T_comb = 700 ps, T_setup = 80 ps, and the clock period is 1 ns with zero skew. What is the setup slack? Now the same path has T_hold = 90 ps and fast-corner T_cq = 60 ps, T_comb = 50 ps — is hold met?

Accepted Answer

Setup slack = required − arrival = (T_clk + T_skew − T_setup) − (T_cq + T_comb) . = (1000 + 0 − 80) − (120 + 700) = 920 − 820 = +100 ps → setup MET with 100 ps margin. Hold slack = arrival − required = (T_cq + T_comb) − (T_hold + T_skew) , evaluated at the fast (min) corner. = (60 + 50) − (90 + 0) = 110 − 90 = +20 ps → hold MET with 20 ps margin. Key analysis subtlety: setup is checked with max (slow) delays, hold with min (fast) delays &mdash; that is why T_cq/T_comb differ between the two checks even on the same physical path.

Question 29

Explain metastability. How does it relate to setup/hold, and how is MTBF affected by it?

Accepted Answer

Metastability is the condition where a flop's output hovers at an indeterminate voltage (between valid 0 and 1) for an unbounded time after the clock edge, eventually resolving randomly to 0 or 1. It happens when D changes inside the setup/hold window, so the internal cross-coupled latch is driven near its unstable balance point. Violating setup or hold does not guarantee a wrong value &mdash; it creates a probability of metastability and a resolution time that can exceed the available slack. The output may oscillate or settle late, and a downstream flop can then sample a half-resolved value, propagating the failure. MTBF (mean time between failures) for a synchronizer: MTBF = e^(T_r / &tau;) / (T_w &middot; f_clk &middot; f_data) , where T_r is the resolution time available, &tau; is the flop's resolution time constant, T_w the metastability window, f_clk the sampling clock, f_data the data toggle rate. Mitigation: multi-flop synchronizers (2 or 3 FFs) for asynchronous crossings &mdash; each extra stage gives another full cycle of resolution time, raising MTBF exponentially. Synchronizers do not eliminate metastability; they make it astronomically improbable.

Question 30

List the techniques to fix a setup violation, and separately the techniques to fix a hold violation. Why can fixing one create the other?

Accepted Answer

Setup fixes (need the data faster, or more time): Reduce T_comb : upsize cells, restructure/buffer long nets, reduce logic depth (logic restructuring, retiming). Reduce load/fanout, fix poor placement of the path. Add useful skew : make the capture clock arrive later (positive skew) to borrow time. Relax the period (lower frequency) &mdash; last resort, since setup is frequency dependent. Hold fixes (need the data slower): Insert delay/hold-buffer cells on the short data path (the standard fix). Downsize fast cells or use higher-Vt (slower) cells on the path. Reduce positive clock skew at the capture flop, or balance the clock tree. Why they conflict: adding delay to fix hold slows the path and erodes setup margin; speeding the path or adding positive skew to fix setup makes data race faster / capture later and erodes hold. Useful skew especially trades setup margin on one stage for hold margin loss on the next &mdash; fixes must respect both windows simultaneously.

Question 31

Where do clock uncertainty, jitter, and OCV derate apply — to setup, to hold, or both? Be precise about the sign.

Accepted Answer

Clock uncertainty models jitter + margin and is subtracted from the available time so it always makes the check harder : Setup: uncertainty reduces the effective period → T_clk − T_uncertainty on the required side. Both jitter and setup margin apply. Hold: typically only the margin component of uncertainty applies (often a smaller hold-uncertainty), and it adds to the required hold → harder. Cycle-to-cycle jitter largely cancels for hold because launch and capture use the same edge, so many flows set hold uncertainty lower than setup uncertainty. OCV / derate (on-chip variation): models that launch and capture paths see different process/voltage conditions. Setup (max analysis): late -derate the data/launch-clock path (slower) and early -derate the capture-clock path (faster) &mdash; worst case for catching the edge. Hold (min analysis): early -derate the data/launch path (faster) and late -derate the capture-clock path (slower) &mdash; worst case for the race. Advanced flows replace flat derate with AOCV/POCV (depth- and distance-dependent, or statistical) to reduce pessimism. The common clock path (up to the divergence point) is derated consistently &mdash; CRPR credits back the over-pessimism on the shared portion.

Question 32

Walk me through a complete setup/hold waveform for a launch flop, a combinational path, and a capture flop in the same cycle. Where exactly are the two checks made?

Accepted Answer

Sequence: At launch edge (cycle N), after T_cq the launch Q updates and the new value propagates through T_comb , arriving at the capture flop's D as D_arrival . Setup check is made against the next capture edge (N+1): D_arrival must settle at least T_setup before that edge. Required = T_clk + T_skew − T_setup . Hold check is made against the same edge (N): the new data must not arrive earlier than T_hold after that edge, i.e. it must not corrupt what edge N is trying to capture. Required = T_hold + T_skew . So setup and hold checks for a single-cycle path are made against two different edges separated by one period &mdash; that is precisely why setup involves T_clk and hold does not.

Question 33

How does clock skew affect setup and hold? Explain useful (intentional) skew and its limit.

Accepted Answer

Define T_skew = T_capture_clk − T_launch_clk (positive = capture clock arrives later). Setup: positive skew helps &mdash; the capture edge moves later, giving the data more time: T_clk ≥ T_cq + T_comb + T_setup − T_skew . Hold: positive skew hurts &mdash; the racing data has more time to reach the capture flop before its (delayed) hold window closes: T_cq + T_comb ≥ T_hold + T_skew . Useful (clock) skew is intentionally adjusting clock arrival to borrow time across a pipeline: delay the capture clock of a critical stage to fix its setup, which simultaneously delays the launch of the next stage. Limit: the borrowed time is taken from the neighboring stage, and the added positive skew degrades hold on the borrowing stage. So skew is bounded by the next stage's setup margin and the current stage's hold margin &mdash; it redistributes slack, it does not create it.

Question 34

A block passes setup at signoff but fails hold on silicon. The clock was already slowed in test. What is your debug approach?

Accepted Answer

Slowing the clock confirms it is a hold problem (hold is frequency independent &mdash; slowing did not help), which is consistent with the symptom. Confirm the corner: hold fails at the fast corner (low temp / high V / fast process for most nodes, though temperature inversion can make hot the worst hold corner at advanced nodes). Re-check that signoff covered the true min-delay corner, including the right RC (min) extraction. Check skew / CTS: excess positive skew at the capture flop is a classic cause. Look for clock-tree imbalance, missing CRPR credit, or a divergent common path that left real skew. Check derate/OCV: were min-delay derates and POCV applied on the data path and late-derate on capture clock? Optimistic derate hides hold fails. Check for missed/false hold arcs: clock-gating, multicycle, or async paths mis-constrained; missing hold buffers that were dropped in ECO or eaten by routing. Check IR drop / noise: dynamic voltage rise or crosstalk can speed the data path beyond the static min model, opening a real hold race not seen in nominal STA. Fix: insert delay cells on the offending min paths via ECO and re-run min-corner timing across all PVT + OCV; verify no setup regression.

Question 35

Two flops are driven by the same clock but the data path between them is a single inverter (very short). Setup is fine. Why is this the riskiest hold scenario, and what is the worst-case condition?

Accepted Answer

A near-zero-delay data path is the canonical hold danger: the new data launched on edge N reaches the capture flop almost immediately, easily arriving within the capture flop's hold window of that same edge N.

Hold margin = (T_cq + T_comb) − (T_hold + T_skew). With tiny T_comb, the left side is small, so any positive skew or fast corner pushes it negative.
Worst case: fast/min corner (smallest T_cq and T_comb) plus positive skew at the capture flop (capture clock later than launch). This combination minimizes data arrival relative to the hold requirement.
Because T_clk is absent, you cannot test your way out of it on the bench by changing frequency — it is a structural race.

Fix: insert hold/delay buffers on the data path so T_cq + T_comb exceeds T_hold + T_skew with margin, and balance the clock tree to remove the harmful positive skew.

Question 36

What is a setup/hold borrow (time borrowing) and why does it apply to latches but not edge-triggered flops?

Accepted Answer

Time borrowing is when a level-sensitive latch lets data that arrives late (while the latch is transparent) pass through and propagate, effectively borrowing time from the next pipeline stage. A latch is transparent for the full active phase of the clock, so data arriving after the opening edge but before the closing edge still flows through. The slack 'borrowed' is bounded by the transparency window (roughly the active phase minus the setup of the closing edge). Edge-triggered flops sample only at the edge (an instant), so there is no transparency window &mdash; data must meet setup before that edge or it is lost. No borrowing is possible. Latch-based (time-borrowing) design averages slack across stages and tolerates imbalance, which is why high-performance CPUs use it &mdash; at the cost of much harder STA (level-sensitive checks, borrow limits, more complex hold analysis since the latch can be transparent when new data arrives).

Question 37

Define clock period and frequency, and write the relationship between them. If a design must run at 1.25 GHz, what is the clock period in ps?

Accepted Answer

Clock period T_clk is the time for one full cycle of the clock; frequency f is the number of cycles per second. They are reciprocals: f = 1 / T_clk . For a 1.25 GHz target: T_clk = 1 / (1.25e9) = 0.8 ns = 800 ps . Why it matters in STA: the period is the budget the setup check must close into. The fundamental single-cycle setup constraint is T_clk ≥ T_cq + T_comb + T_setup - T_skew + T_uncertainty , so improving frequency means shrinking T_comb or buying margin from T_skew /borrow. Hold has no T_clk term — it is a same-edge check, which is why hold violations cannot be fixed by slowing the clock.

Question 38

Distinguish source latency from network latency. Which one models the off-chip path, and how does each affect setup and hold timing?

Accepted Answer

Source latency (a.k.a. insertion delay outside the clock-tree boundary) is the delay from the clock's true origin — typically the board oscillator / PLL output — to the clock definition point (the port or pin where create_clock is applied). It models the off-chip / pre-port path. Network latency is the delay from the clock definition point through the on-chip clock network (clock tree) to the register clock pins. Before CTS (ideal clocks), network latency is estimated via set_clock_latency and source latency via set_clock_latency -source . After CTS (propagated clocks), network latency becomes the real, per-endpoint tree delay; source latency usually stays as a fixed model. Key point: when source latency is applied equally to launch and capture clocks of the same clock, it cancels out in single-clock setup/hold checks and does not change slack. It only matters for cross-clock checks or when launch and capture clocks have different source latencies.

Question 39

What is clock skew? Define positive and negative skew precisely with respect to the launch and capture flops, and state the sign convention you are using.

Accepted Answer

Clock skew is the difference in clock arrival time between two related registers. Define it as T_skew = T_capture - T_launch (capture-clock arrival minus launch-clock arrival). Positive skew: T_skew > 0 — the capture clock arrives later than the launch clock. The capture flop is given extra time, so positive skew helps setup and hurts hold . Negative skew: T_skew < 0 — the capture clock arrives earlier than the launch clock. This hurts setup and helps hold . In the equations: Setup: T_cq + T_comb + T_setup ≤ T_clk + T_skew Hold: T_cq + T_comb ≥ T_hold + T_skew Always state the convention first — many interviewers use launch - capture , which flips the signs. The physics (later capture helps setup) never changes; only the label does.

Question 40

What is useful (intentional) skew? Give the setup and hold inequalities for a launch-to-capture path and explain how a tool 'borrows' time across stages with useful skew.

Accepted Answer

Useful skew is intentionally engineering the clock arrival times (rather than minimizing skew to zero) to balance slack between adjacent pipeline stages. By delaying the capture clock of a critical stage, you lend it time stolen from the next stage. For a path with positive skew T_skew = T_capture - T_launch : Setup: T_cq + T_comb + T_setup ≤ T_clk + T_skew — extra T_skew relaxes the slow stage. Hold: T_cq + T_comb ≥ T_hold + T_skew — but the same skew tightens hold. The borrow: delaying flop B's clock helps the A→B path but the same delay is now part of B→C's launch, stealing from the B→C budget. So useful skew is zero-sum across the cycle — you move slack from a stage that has margin to one that does not. Only works when the donor stage has positive setup slack to give. Always recheck hold on the helped path and setup on the donor path. It is a real, committed clock-tree delay (unlike time borrowing in latches, which is a same-cycle phase effect).

Question 41

What is clock jitter? Distinguish period jitter from cycle-to-cycle jitter, and explain how jitter is modeled in STA.

Accepted Answer

Jitter is the short-term, random variation of clock edges from their ideal positions, caused mainly by the PLL, supply noise, and the clock source. Unlike skew (spatial, deterministic, between two points), jitter is temporal and statistical at a single point. Period jitter: deviation of any single period from the ideal period — relevant to single-cycle setup margin. Cycle-to-cycle jitter: difference between two consecutive periods — relevant to back-to-back-edge effects. In STA jitter is not simulated edge-by-edge; it is folded into the clock uncertainty applied to the capture edge via set_clock_uncertainty . For setup, jitter effectively shortens the available period (the capture edge may come early), so it is subtracted from the setup side. For hold, since it is a same-edge check on a single clock, ideal-clock jitter largely common-modes out and is typically excluded or set much smaller — that is why setup and hold uncertainty are specified separately.

Question 42

How does clock uncertainty enter the setup check versus the hold check? Why are the two values usually different and why does setup uncertainty shrink the period while hold uncertainty does not?

Accepted Answer

Clock uncertainty ( set_clock_uncertainty ) is a lumped margin covering jitter, estimated skew (pre-CTS), and guardband. STA applies it pessimistically: it always makes the relevant check harder . Setup — the capture edge is assumed to arrive early by the uncertainty, shrinking the usable period: T_cq + T_comb + T_setup ≤ T_clk - U_setup + T_skew Hold — hold is a same-edge check (launch and capture on the same edge), so there is no T_clk to shrink. Instead the uncertainty widens the required separation; it is added to the hold requirement: T_cq + T_comb ≥ T_hold + U_hold + T_skew Why different values: setup uncertainty includes jitter (which accumulates over a full period and pushes edges in either direction), so it is larger. Hold happens on the same edge, so jitter is highly common-mode and cancels; hold uncertainty captures only residual skew/OCV, so it is smaller (often near zero pre-CTS). Using one value for both would either over-pessimize hold or under-protect setup.

Question 43

A flop has T_cq = 80 ps, combinational delay 600 ps, capture-flop setup 50 ps, hold 40 ps, clock period 800 ps, setup uncertainty 60 ps, hold uncertainty 20 ps, and clock skew T_skew = +30 ps (capture later). Compute setup and hold slack. Does the path pass?

Accepted Answer

Using T_skew = T_capture - T_launch = +30 ps. Setup slack = required - arrival = (T_clk + T_skew - U_setup - T_setup) - (T_cq + T_comb) = (800 + 30 - 60 - 50) - (80 + 600) = 720 - 680 = +40 ps → setup PASSES with 40 ps margin. Hold slack = arrival - required = (T_cq + T_comb) - (T_hold + U_hold + T_skew) = (80 + 600) - (40 + 20 + 30) = 680 - 90 = +590 ps → hold PASSES easily. Insight: the positive skew added 30 ps to setup but cost 30 ps on hold. Here both pass, but if the hold path were a short bypass (small T_comb ), that +30 ps skew is exactly what creates hold violations — which is why CTS-induced positive skew often shows up as new hold fails.

Question 44

Explain a generated clock and specifically a divide-by-2 clock. How do you constrain it in SDC, and what happens to the timing relationship between the source and divided clocks?

Accepted Answer

A generated clock is a clock derived on-chip from an existing (master) clock — by a divider, multiplier, MUX, or gating cell — rather than defined at a port. You constrain it with create_generated_clock , referencing the master via -source so STA keeps their phase relationship. Divide-by-2 example on a flop output that toggles every master edge: create_generated_clock -name clk_div2 -source [get_pins pll/CLK] -divide_by 2 [get_pins div_ff/Q] The divided clock has 2× the period (half the frequency) of the master. STA derives the launch/capture edges of clk_div2 from the master's edges, so the source → divider insertion delay is automatically inherited — you do not redefine latency from scratch. Paths between the master clock domain and clk_div2 become multicycle-like related checks: because edges align only at multiples of the master period, the tool expands edges over the common period (LCM of the two periods) and picks the tightest setup and hold edge pair. Forgetting -source (or using create_clock on the Q pin) breaks the phase relationship and the inherited insertion delay, giving wrong cross-domain slack.

Question 45

What is a gated clock and why is it used? From an STA standpoint, what must be checked on the clock-gating cell itself?

Accepted Answer

A gated clock is a clock whose toggling is conditionally disabled by an enable signal, almost always via an integrated clock-gating cell (ICG) — typically a latch-based AND/OR gate that suppresses the clock when idle. The purpose is dynamic power reduction : shutting off the clock to idle registers eliminates their clock-tree and flop switching power, and it replaces enable-recirculation MUXes (which keep the clock free-running) with true clock disable. STA checks on the ICG: Setup/hold of the enable relative to the clock at the gating cell — the enable must be stable around the gating point so the clock is never chopped into a glitch or runt pulse. A latch-based ICG samples the enable on the clock's low phase so the gate switches only while the clock is low. Clock-gating checks ( set_clock_gating_check / inferred) enforce this enable-vs-clock relationship. The gating cell is part of the clock network , so its insertion delay and OCV derate count toward downstream skew; CTS must balance through it. Missing the enable timing is a classic source of functional glitches that pure functional sim can hide but STA must catch.

Question 46

After clock tree synthesis, you see setup slack improve on many paths but a wave of new hold violations on short paths. Explain the mechanism in terms of skew, and how it is fixed.

Accepted Answer

Mechanism: CTS replaces the ideal (zero-delay) clock with a real tree. The tree introduces real skew between launch and capture flops. Where the tree happens to deliver the capture clock later than the launch clock (positive skew), setup improves — but the same positive skew directly attacks hold: hold slack = T_cq + T_comb - T_hold - U_hold - T_skew On a short path (small T_comb , e.g. a flop-to-adjacent-flop or shift register), there is little data delay to absorb that skew, so the skew term drives slack negative. Why short paths specifically: hold is a same-edge race between clock skew and data propagation. Long paths have plenty of T_comb margin; short paths do not. Fix: Hold buffers / delay cells inserted on the data path to raise T_comb — the standard fix, since it does not touch setup-critical timing. Reduce the offending skew by rebalancing the clock tree at those leaves. Apply useful-skew adjustments cautiously, re-checking both corners. Always fix hold at the fast/hold corner and re-verify setup at the slow corner afterward.

Question 47

In OCV / AOCV analysis, derate is applied to launch and capture clock paths. For a setup check, which clock path is slowed and which is sped up, and why does this make the analysis pessimistic?

Accepted Answer

On-chip variation (OCV) models the fact that the launch and capture clock paths see different local process/voltage/temperature, so their delays differ even for the same nominal clock. STA applies derate in the pessimistic direction for each check.

Setup check (want capture as early, launch+data as late as possible):

Launch clock path: slowed (late derate, >1.0) — pushes the data launch later.
Capture clock path: sped up (early derate, <1.0) — pulls the capture edge earlier.
Data path: slowed (max delay).

Hold check (the mirror image):

Launch clock path: sped up; capture clock path: slowed; data path: min delay.

This maximizes effective negative skew for each check, so slack is computed against the worst credible variation — hence pessimistic. Because launch and capture share common clock-tree segments up to the divergence point, that shared delay should not be double-derated; CRPR (clock reconvergence pessimism removal) credits it back. AOCV refines OCV by scaling derate with path depth and distance, removing some of that pessimism more accurately than flat OCV.

Question 48

Walk through what changes between an ideal clock and a propagated clock in STA, and which quantities (latency, skew, uncertainty) you'd expect to adjust at each stage of the flow.

Accepted Answer

Ideal clock (pre-CTS): the clock network is assumed to have zero delay; there is no physical tree yet. You model what the tree will do using estimates: set_clock_latency — estimated network insertion delay; -source for off-chip latency. set_clock_uncertainty — large, because it must cover estimated skew + jitter + margin (real skew is unknown). Propagated clock (post-CTS): set_propagated_clock makes STA compute real per-pin clock arrival through the built tree. Latency is now the actual tree delay, no longer an estimate. Skew becomes real and measured (capture - launch per path). Uncertainty is reduced — the skew portion is now real, so uncertainty should mostly carry just jitter + a small guardband ; OCV/AOCV derate now models clock-path variation explicitly. Net: pre-CTS you budget pessimistically with big uncertainty and estimated latency; post-CTS you replace estimates with measured tree behavior and shrink uncertainty so you don't double-count skew that is now physically accounted for.

Question 49

Define slack. How is it computed for a setup (max) check versus a hold (min) check, and what does the sign mean?

Accepted Answer

Slack is the margin by which a timing check passes or fails. It always equals Required Arrival Time - Actual Arrival Time , but the sense of "required" flips between setup and hold. Setup (max-delay) check: Slack = Required - Arrival where Required = T_capture_clk_edge - T_setup + T_uncertainty_adjustment at the capture flop's D pin. Setup is checked against the next clock edge, so a slow (large) data arrival hurts it. Hold (min-delay) check: Slack = Arrival - Required where Required = T_capture_clk_edge + T_hold against the same (current) edge. A fast (small) data arrival hurts it. Sign convention: Positive slack = check met, margin to spare. Zero = exactly on the edge (critical). Negative slack = violation. Note the operands swap so that in both cases positive means good .

Question 50

What are WNS and TNS, and why do we report both? How do they differ from a simple slack number?

Accepted Answer

WNS (Worst Negative Slack) is the single most negative slack across all endpoints in a corner/mode. It tells you the depth of the worst violation and is what bounds the achievable clock period. TNS (Total Negative Slack) is the sum of all negative slacks (positive-slack endpoints contribute 0): TNS = &Sigma; min(slack_i, 0) . It tells you the breadth of the problem. Why both: WNS = -50ps with TNS = -50ps means one failing path: a targeted fix (resize, buffer, useful skew). WNS = -50ps with TNS = -5000ps means hundreds of failing endpoints: a systemic issue (clock period too tight, congestion, bad constraints). A single endpoint slack is local; WNS/TNS aggregate the whole design and are the standard convergence metrics PD engineers track per ECO. By convention both are 0 when the design is clean (no negative endpoints). Tools also report FEP (failing endpoint count) alongside, which further distinguishes depth from breadth.

Question 51

Walk me through a setup report_timing path: what are the columns, and how does arrival accumulate into a final slack?

Accepted Answer

A report_timing path has a data (launch) path and a clock (capture) path , with arrival accumulating cell+net delays. Data arrival builds up as: Clock network delay to the launch flop CK + T_cq (clk-to-Q of launch flop) + each combinational cell delay and net delay along the path = Data Arrival Time at the capture D pin Data required builds up as: Clock network delay to capture flop CK + T_clk_period (the next edge) - library T_setup - clock uncertainty (jitter + margin) +/- CRPR credit for the common clock segment = Data Required Time Slack = Required - Arrival. The report shows incremental and cumulative columns so you can see exactly which stage dominates delay. The key insight: launch and capture share a common clock root , so only the divergent portion of skew is real.

Question 52

Distinguish slew from transition. What does transition time physically measure, and what thresholds are used?

Accepted Answer

Slew and transition time are used interchangeably in STA: both mean the time a signal takes to switch between two voltage thresholds. Physically it is the time for the node to charge/discharge through the driver's output resistance into the load capacitance, i.e. roughly t_trans &asymp; k &middot; R_drive &middot; C_load . A weak driver or heavy load gives a slow (large) slew. Thresholds: measured between library-defined points, commonly 10%-90% or, more often in modern libs, 20%-80% or 30%-70% of VDD. The library's slew_lower_threshold_pct / slew_upper_threshold_pct define them, and the slew_derate_from_library attribute scales between the measured (e.g. 20%-80%) value and the full-swing equivalent the tool uses internally. Key points: Rise and fall slews can differ (asymmetric drive strength). Input slew is an input to delay calculation; output slew is an output that propagates to the next stage. set_max_transition is a design rule constraint (DRC), independent of slack but a precondition for valid delay calc.

Question 53

How does input slew affect a cell's delay? Why does a slower input slew not just add a fixed delay?

Accepted Answer

Input slew is one of the two axes of every NLDM delay table, so delay is a nonlinear function of it, not an additive offset. Mechanism: a slower input transition means the input crosses the gate's switching threshold later and the transistor turns on more gradually, so both the cell delay and the output slew increase. Output slew then feeds the next stage's delay lookup, so a degraded slew propagates and compounds down the path. Why not fixed: Delay is measured from the input threshold crossing to the output threshold crossing; a slower input shifts the input crossing AND changes the device operating region, so the increment depends on the current slew/load operating point. The relationship is captured as a 2D surface delay = f(input_slew, output_load) with interpolation between table points. Practical consequence: a high-fanout or long net with no buffering degrades slew, which inflates downstream delays. Fixing slew (buffer insertion, upsizing) often recovers more slack than it appears to on the local stage alone. This is why set_max_transition exists as a guardrail.

Question 54

Explain the NLDM delay model. What are its inputs, what is interpolated, and where does it lose accuracy?

Accepted Answer

NLDM (Non-Linear Delay Model) stores characterized delay and output slew in 2D lookup tables indexed by input transition (slew) and output load capacitance : cell_delay = f(input_slew, C_load) output_slew = g(input_slew, C_load) Operation: the timer looks up the cell's table at the actual input slew and load, interpolating (typically bilinear) between the nearest characterized grid points; outside the grid it extrapolates (lower confidence). The output slew becomes the next stage's input slew, chaining stage by stage. Where it loses accuracy: It models the load as a single lumped C , so it cannot capture resistive shielding of long RC interconnect (the driver sees less than the full pin cap). It produces a delay/slew number, not a current waveform , so it is weak on RC interconnect delay and noise/coupling effects. For accuracy, a receiver pin model + Elmore/AWE reduction approximates the net, but at advanced nodes NLDM is replaced by CCS (current-based) or ECSM for better interconnect and waveform fidelity.

Question 55

What is CCS, and why did the industry move from NLDM to CCS at advanced nodes?

Accepted Answer

CCS (Composite Current Source) models the cell's output as a time- and voltage-dependent current source rather than as a single delay/slew number. The library stores current waveforms as a function of input slew and output load, plus a receiver capacitance model (often split into C1 before the threshold and C2 after). Why the move: Interpolation accuracy: NLDM interpolates scalar delay; at sub-40nm the actual waveform is non-saturated and NLDM error grows. CCS drives the actual RC network with a real current waveform, giving accurate interconnect delay and slew. Receiver model: NLDM treats the receiver as a fixed pin cap; CCS captures the nonlinear, voltage-dependent input cap (Miller effect), which matters for accurate driver loading. Waveform propagation: CCS preserves the real voltage waveform shape, important for noise/SI analysis and for non-monotonic waveforms. Cost: CCS libraries are several times larger and slower to read. ECSM (Cadence) is the voltage-based analog. Both are "current/waveform-based" successors to NLDM and standard at 16nm and below.

Question 56

How do load capacitance and fanout influence delay and slew? Walk through what happens when you double the fanout on a net.

Accepted Answer

Both delay and slew are monotonically increasing in output load to first order. Doubling fanout (driving twice as many receiver pins) roughly doubles the pin-capacitance component of the load: Driver delay increases - the driver must charge more C through its output resistance ( delay &propto; R_drive &middot; C_load to first order), so the NLDM/CCS lookup at higher C returns a larger delay. Output slew degrades (gets slower) - same RC charging argument. That degraded slew propagates to every receiver, increasing their delays too (the input-slew axis effect from the earlier question). Total load = pin caps + wire cap. At advanced nodes wire cap and resistance often dominate , and long wires add RC delay independent of fanout count. Resistive shielding means the driver does not see the full far-end cap. Fixes: insert buffers to split the fanout (each driver sees less load and slew recovers), upsize the driver (lower R), or use set_max_fanout / set_max_capacitance as DRC guardrails. This is the core of fanout/load optimization in synthesis and PD.

Question 57

A setup path has a slack of -80ps. Walk me through how you would attack it. Now the same path also has a hold violation — does fixing one help or hurt the other?

Accepted Answer

Setup -80ps (path too slow), fix by reducing data arrival or relaxing required: Upsize slow cells / use higher-drive or LVT (faster) cells on the critical stages. Buffer / restructure high-fanout or long nets to fix slew (often the hidden cause). Reduce logic depth (logic restructuring, better mapping). Useful skew: delay the capture clock / advance the launch clock to borrow time. Check constraints: false/multicycle paths, over-tight uncertainty. Setup vs hold interaction: they pull in opposite directions . Setup wants the data path faster ; hold wants short paths slower . Making the cell faster (upsizing/LVT) improves setup but worsens hold on that same path. Adding hold-fix delay buffers worsens setup . Useful skew is doubly dangerous: delaying the capture clock helps setup but directly eats hold margin (hold uses the same/current edge), and vice versa. The clean approach: fix setup first with cell/skew changes, then close hold with min-delay buffers on the short paths only , which adds delay without touching the setup-critical paths. Always re-check both checks at all corners after any ECO.

Question 58

How do clock uncertainty, jitter, and OCV/derate enter the slack equation — and crucially, do they apply to setup, hold, or both?

Accepted Answer

Clock uncertainty ( set_clock_uncertainty ) is a pessimism margin subtracted from the available time. For setup it tightens required time; for hold it also tightens (pushes required later relative to data). You can set separate setup and hold uncertainty via -setup / -hold . Setup uncertainty includes clock jitter + setup margin (period is uncertain). Hold uncertainty is usually smaller - jitter largely cancels because launch and capture see the same edge ; mostly it covers skew margin. OCV / derate models on-die variation by scaling delays: Setup (max) check: derate the launch + data path slow (e.g. ×1.05) and the capture clock fast (e.g. ×0.95) - worst case for setup. Hold (min) check: the opposite - launch/data fast , capture clock slow - worst case for hold. CRPR (Clock Reconvergence Pessimism Removal) credits back derate applied to the common clock segment , since one physical path cannot be simultaneously fast and slow. Modern flows use AOCV/POCV (depth/distance- and statistically-based derates) instead of a flat factor. Net effect on slack: uncertainty and derate reduce margin on both checks , but with opposite directionality, which is why both must be analyzed.

Question 59

A path's setup slack is fine in isolation, but the report shows a large slew at the capture flop's D pin. Why might this still cause a problem, and is slew a part of slack?

Accepted Answer

Slew is not directly a term in the slack equation, but it affects slack indirectly and as a separate DRC . Why a large D-pin slew is still a problem: Setup time grows with input slew. Library T_setup is characterized as a function of data slew (and clock slew); a degraded D-pin slew increases the required setup time, eating slack even though the arrival looked fine. Max-transition DRC violation. If it exceeds set_max_transition the design is not signoff-clean regardless of slack, and delay-calc accuracy degrades. Hold risk: sloppy slews change hold time too and increase SI/noise susceptibility. Is slew part of slack? Indirectly: slew is an input to delay and to setup/hold constraint values , both of which feed slack. But it is reported and constrained separately as a design rule. A common interview trap: assuming a clean arrival number means a clean check - if the slew is bad, the characterized setup/hold itself shifts and the slack you computed by hand will be wrong. Always fix transition violations before trusting timing.

Question 60

Senior-level: a register-to-register path passes setup by +5ps at the slow corner but the same path fails hold by -30ps at the fast corner. Explain how a single physical path produces opposite results across corners, and how this constrains your fix.

Accepted Answer

This is normal multi-corner behavior; setup and hold are checked at different corners because they have opposite worst cases. Why opposite: Setup worst at slow corner (slow process, low voltage, high temp historically): cells are slowest, so data arrives latest - hardest to meet the next edge. Slack_setup = Required - Arrival is tightest here. Hold worst at fast corner (fast process, high voltage): cells are fastest, data races through and arrives too early , violating the same-edge hold. Slack_hold = Arrival - Required is tightest here. Plus temperature inversion at low nodes can flip which temperature is worst, so signoff sweeps multiple PVT corners. How it constrains the fix: The hold fix must add delay at the fast corner without breaking the +5ps setup margin at the slow corner - a tight window. Insert min-delay buffers sized so their slow-corner delay contribution stays under 5ps, while their fast-corner delay closes the 30ps gap. Because a buffer's slow-corner delay is larger than its fast-corner delay, you may not be able to add 30ps of fast-corner delay without blowing the 5ps setup budget. If the window is infeasible, prefer useful skew or restructuring, and re-verify both corners (and CRPR) after the ECO. Takeaway: never close hold at the fast corner without re-checking setup at the slow corner - they share the same cells, so every delay element trades one against the other.

Question 61

What is a multicycle path (MCP), and why would you declare one?

Accepted Answer

A multicycle path is a path the STA tool is told to evaluate over more than one clock cycle instead of the default single cycle. By default STA assumes data launched at edge N must be captured at edge N+1 (one cycle). A multicycle path relaxes that: you tell the tool data is only required to be stable after, say, 2 or 3 cycles. You declare one when the design genuinely allows extra cycles for the data to settle, for example: A slow combinational block (large multiplier, divider) whose result is only sampled every Nth cycle by enable/clock-gating logic. A path crossing between an integer-ratio clock and its divided version where the receiver only loads on certain edges. The command sets the setup multiplier : set_multicycle_path 3 -setup -from FF1 -to FF2 . Critically, the hold check must be fixed too (see the off-by-one question). The risk: if the hardware does not actually hold off the capture, the MCP is a lie and the chip fails silicon timing despite a clean STA report.

Question 62

Walk me through a setup-multicycle-path of 3 on a waveform. Where do the launch and capture edges land, and what is the resulting setup check?

Accepted Answer

With set_multicycle_path 3 -setup , the data is launched at edge 0 as usual, but the setup capture edge moves out from edge 1 (default) to edge 3. So the available time becomes three clock periods instead of one. Setup requirement becomes: T_cq + T_comb + T_setup ≤ 3&middot;T_clk - T_setup_unc Key points an interviewer wants: The launch edge does not move — only the setup capture edge moves later by (N-1) cycles. Setup uncertainty (jitter + margin) is still subtracted once , at the captured edge. The relaxation is on the setup side only; hold is a separate, dangerous story. In the diagram, default capture is edge 1; the MCP-3 setup capture is edge 3, giving 3 periods of slack.

Question 63

This is the classic one: when you set a multicycle of 3 for setup, what must you set for hold, and why? Show the off-by-one trap.

Accepted Answer

When you declare set_multicycle_path 3 -setup , the tool by default puts the hold check one edge before the (moved) setup capture edge — i.e. at edge 2. That is almost always wrong and creates an impossibly tight (and bogus) hold requirement that pulls the hold edge far from the launch edge. You must pull the hold check back with: set_multicycle_path 2 -hold -from FF1 -to FF2 The rule: for a setup multicycle of N , set hold multicycle of N-1 , so the hold check returns to the same edge as the launch edge (edge 0), exactly as in a normal single-cycle path. Setup MCP = N moves the setup capture edge forward by N-1. Hold MCP = N-1 moves the hold capture edge back by N-1, restoring it to edge 0. The off-by-one trap: if you forget the hold MCP, the hold check is referenced to edge 2 (hold launch at edge 0, capture at edge 2), demanding the data not change for ~2 cycles after launch — a massive, false hold violation that engineers then 'fix' by inserting huge delay buffers, breaking real timing. Always pair setup MCP N with hold MCP N-1 (for same-edge integer-ratio clocks).

Question 64

Derive the general setup and hold equations for a multicycle path with setup multiplier N and hold multiplier M.

Accepted Answer

Let the launching edge be at time 0 and the clock period be T_clk.

Setup capture edge is at N·T_clk:
T_launch_clk + T_cq + T_comb ≤ N·T_clk + T_capture_clk - T_setup - T_setup_unc
where T_launch_clk / T_capture_clk are the clock network latencies (skew shows up as their difference).

Hold capture edge is at M·T_clk:
T_launch_clk + T_cq + T_comb ≥ M·T_clk + T_capture_clk + T_hold + T_hold_unc

The crucial relationship: for two flops on the same clock, you want the hold check referenced to the launching edge itself, i.e. edge 0. Choosing M = N - 1 places the hold capture edge at (N-1)·T_clk — and because the setup multiplier already pushed the setup capture to edge N, the hold check now lands exactly one cycle before it, which is edge 0 relative to the launch. This recovers the standard single-cycle hold check:
T_cq + T_comb ≥ T_hold + T_hold_unc (plus the skew term T_launch_clk - T_capture_clk).

Note the uncertainty signs: setup subtracts T_setup_unc (makes the requirement harder); hold adds T_hold_unc (also makes it harder). They are applied once, not multiplied by N.

1. STA Fundamentals 12 questions

Q1What is Static Timing Analysis (STA), and why do we use it instead of gate-level (dynamic) simulation to sign off timing?

Q2Define a timing path. What are its startpoint and endpoint, and what kinds of points can each be?

Q3What is a timing arc? Distinguish cell arcs from net arcs, and name the main types of cell arc.

Q4STA splits checks into the four classic path groups: in2reg, reg2reg, reg2out, in2out. Define each, and say which require I/O constraints.

Q5Explain arrival time and required time, and how slack is computed for a setup check and for a hold check. Be precise about signs.

Q6Why is STA called 'exhaustive' and 'vectorless'? What does that buy you and what does it cost you?

Q7What are the key limitations of STA, and what classes of problems can it NOT find?

Q8Write the basic single-cycle setup and hold inequalities for a reg2reg path, including clock skew and uncertainty, and explain why hold is independent of clock period.

Q9An interviewer says: 'Your reg2reg setup path has positive slack but the design still fails on silicon at speed.' Give plausible STA-related root causes.

Q10Define unateness for a timing arc (positive unate, negative unate, non-unate). Why does it matter for STA?

Q11What is the difference between path-based and graph-based (block-based) analysis in STA, and when is each used?

Q12On a setup path, the capture clock arrives later than the launch clock (positive skew). Does that help or hurt setup, and what about hold? State it generally.

2. Timing Paths & Path Types 12 questions

Q13What are the four canonical timing path groups in STA? Define each by its start point and end point.

Q14Differentiate the data path from the clock path in a reg-to-reg timing arc. Why is the clock path traversed twice in one setup check?

Q15Write the fundamental setup (max-delay) inequality for a reg-to-reg path and define every term.

Q16Write the hold (min-delay) inequality and explain why hold violations cannot be fixed by slowing the clock.

Q17Define launch and capture flip-flops. For a setup check on a single-cycle path, which clock edges launch and capture, and what changes for hold?

Q18Walk through computing the data arrival time and the data required time for a setup check, then define slack.

Q19What exactly is the 'critical path'? Is the longest combinational path always the critical path? Justify.

Q20Why does positive clock skew help setup but hurt hold? Give the sign convention and the intuition.

Q21Given T_cq = 0.15 ns, T_comb = 0.80 ns, T_setup = 0.10 ns, T_clk = 1.00 ns, capture insertion = 0.30 ns, launch insertion = 0.25 ns, clock uncertainty = 0.05 ns. Compute the setup slack. Then state the max frequency if skew and uncertainty were zero.

Q22On a pure input-to-output (combinational feedthrough) path with no flops, how is timing constrained, and what is the arrival/required computation?

Q23Draw and explain the full reg-to-reg path schematic and annotate where clock-to-Q, combinational delay, and setup come from physically.

Q24Senior-level: in latch-based timing, how do time borrowing and the concept of paths-through-latches change the basic setup equation, and what is the risk?

3. Setup & Hold Checks 12 questions

Q25Define setup time and hold time for a flip-flop. What physically goes wrong if each is violated?

Q26Write the setup timing equation for a single-cycle flop-to-flop path and explain every term, including the sign of skew.

Q27Write the hold timing equation and explain why hold checks are independent of clock frequency.

Q28A path has T_cq = 120 ps, T_comb = 700 ps, T_setup = 80 ps, and the clock period is 1 ns with zero skew. What is the setup slack? Now the same path has T_hold = 90 ps and fast-corner T_cq = 60 ps, T_comb = 50 ps — is hold met?

Q29Explain metastability. How does it relate to setup/hold, and how is MTBF affected by it?

Q30List the techniques to fix a setup violation, and separately the techniques to fix a hold violation. Why can fixing one create the other?

Q31Where do clock uncertainty, jitter, and OCV derate apply — to setup, to hold, or both? Be precise about the sign.

Q32Walk me through a complete setup/hold waveform for a launch flop, a combinational path, and a capture flop in the same cycle. Where exactly are the two checks made?

Q33How does clock skew affect setup and hold? Explain useful (intentional) skew and its limit.

Q34A block passes setup at signoff but fails hold on silicon. The clock was already slowed in test. What is your debug approach?

Q35Two flops are driven by the same clock but the data path between them is a single inverter (very short). Setup is fine. Why is this the riskiest hold scenario, and what is the worst-case condition?

Q36What is a setup/hold borrow (time borrowing) and why does it apply to latches but not edge-triggered flops?

4. Clocks: Skew, Jitter, Latency, Uncertainty 12 questions

Q37Define clock period and frequency, and write the relationship between them. If a design must run at 1.25 GHz, what is the clock period in ps?

Q38Distinguish source latency from network latency. Which one models the off-chip path, and how does each affect setup and hold timing?

Q39What is clock skew? Define positive and negative skew precisely with respect to the launch and capture flops, and state the sign convention you are using.

Q40What is useful (intentional) skew? Give the setup and hold inequalities for a launch-to-capture path and explain how a tool 'borrows' time across stages with useful skew.

Q41What is clock jitter? Distinguish period jitter from cycle-to-cycle jitter, and explain how jitter is modeled in STA.

Q42How does clock uncertainty enter the setup check versus the hold check? Why are the two values usually different and why does setup uncertainty shrink the period while hold uncertainty does not?

Q43A flop has T_cq = 80 ps, combinational delay 600 ps, capture-flop setup 50 ps, hold 40 ps, clock period 800 ps, setup uncertainty 60 ps, hold uncertainty 20 ps, and clock skew T_skew = +30 ps (capture later). Compute setup and hold slack. Does the path pass?

Q44Explain a generated clock and specifically a divide-by-2 clock. How do you constrain it in SDC, and what happens to the timing relationship between the source and divided clocks?

Q45What is a gated clock and why is it used? From an STA standpoint, what must be checked on the clock-gating cell itself?

Q46After clock tree synthesis, you see setup slack improve on many paths but a wave of new hold violations on short paths. Explain the mechanism in terms of skew, and how it is fixed.

Q47In OCV / AOCV analysis, derate is applied to launch and capture clock paths. For a setup check, which clock path is slowed and which is sped up, and why does this make the analysis pessimistic?

Q48Walk through what changes between an ideal clock and a propagated clock in STA, and which quantities (latency, skew, uncertainty) you'd expect to adjust at each stage of the flow.

5. Slack, Slew & Transition 12 questions

Q49Define slack. How is it computed for a setup (max) check versus a hold (min) check, and what does the sign mean?

Q50What are WNS and TNS, and why do we report both? How do they differ from a simple slack number?

Q51Walk me through a setup report_timing path: what are the columns, and how does arrival accumulate into a final slack?

Q52Distinguish slew from transition. What does transition time physically measure, and what thresholds are used?

Q53How does input slew affect a cell's delay? Why does a slower input slew not just add a fixed delay?

Q54Explain the NLDM delay model. What are its inputs, what is interpolated, and where does it lose accuracy?

Q55What is CCS, and why did the industry move from NLDM to CCS at advanced nodes?

Q56How do load capacitance and fanout influence delay and slew? Walk through what happens when you double the fanout on a net.

Q57A setup path has a slack of -80ps. Walk me through how you would attack it. Now the same path also has a hold violation — does fixing one help or hurt the other?

Q58How do clock uncertainty, jitter, and OCV/derate enter the slack equation — and crucially, do they apply to setup, hold, or both?

Q59A path's setup slack is fine in isolation, but the report shows a large slew at the capture flop's D pin. Why might this still cause a problem, and is slew a part of slack?

Q60Senior-level: a register-to-register path passes setup by +5ps at the slow corner but the same path fails hold by -30ps at the fast corner. Explain how a single physical path produces opposite results across corners, and how this constrains your fix.

6. Timing Exceptions 12 questions

Q61What is a multicycle path (MCP), and why would you declare one?

Q62Walk me through a setup-multicycle-path of 3 on a waveform. Where do the launch and capture edges land, and what is the resulting setup check?

Q63This is the classic one: when you set a multicycle of 3 for setup, what must you set for hold, and why? Show the off-by-one trap.

Q64Derive the general setup and hold equations for a multicycle path with setup multiplier N and hold multiplier M.

Q65What is a false path? Give two realistic examples and the command to declare one.

Q66Explain set_max_delay and set_min_delay. When would you use them instead of a multicycle or false path?

Q67What does set_disable_timing do, and how is it different from a false path?

Q68What is case analysis (set_case_analysis), and how does it interact with timing exceptions?

Q69A junior engineer is closing timing and decides to set_false_path every CDC violation he sees to clean up the report. What is wrong with this, and what should he do instead?

Q70Explain the priority/precedence order among timing exceptions when more than one applies to the same path.

Q71Senior-level: what is over-constraining, how does over-constraining with exceptions specifically hurt you, and how do you audit an SDC for bad exceptions?

Q72On a half-cycle (negedge-capture) path versus a multicycle path, how do the launch/capture edges and hold checks differ?

7. OCV, AOCV, POCV & Derating 11 questions

Q73What is On-Chip Variation (OCV) and why do we model it in STA?