Static Timing Analysis - Interview Questions

128 real STA interview questions with worked answers and timing diagrams, across 11 topics - from fundamentals to signoff. Everything here is free.

128 questions11 topics40 diagrams

1. STA Fundamentals 12 questions

Q1What is Static Timing Analysis (STA), and why do we use it instead of gate-level (dynamic) simulation to sign off timing?

STA is a method of verifying a design's timing by computing signal arrival times along every timing path and checking them against the setup/hold constraints at each endpoint, without applying any input vectors. It is static (topology + timing models, not value-based) and exhaustive (all paths checked).
Why over gate-level sim:
  • Coverage: simulation only exercises the paths your stimulus toggles; STA checks every path regardless of functional activity.
  • Speed: no need to develop/run vectors; analysis is roughly linear in design size, so it scales to multi-million-gate blocks.
  • Worst-case guarantee: STA finds the worst path in each timing group; sim would need an impractically large vector set to hit the same worst case.
The trade-off: STA only checks timing, not function, and it can flag paths that are never functionally exercised (false paths) unless you tell it about them.

Q2Define a timing path. What are its startpoint and endpoint, and what kinds of points can each be?

A timing path is a point-to-point route through the design along which a single launch-to-capture timing check is made. It runs from a startpoint to an endpoint.
Startpoint (where data is launched): either an input port of the design, or the clock pin of a sequential element (FF/latch).
Endpoint (where data is captured/checked): either the data (D) pin of a sequential element, or an output port of the design.
Each path consists of alternating cell arcs (through gates) and net arcs (interconnect), and the tool propagates a delay along the whole chain. Note a path is always single-clock-edge to single-clock-edge; combinational feedback or async crossings are handled specially.
A reg-to-reg timing path: launched at FF1/CLK, captured at FF2/D. — click to enlarge

Q3What is a timing arc? Distinguish cell arcs from net arcs, and name the main types of cell arc.

A timing arc is a directed delay (and constraint) relationship between two pins that STA uses to propagate timing.
Cell arc (internal to a standard cell, from the liberty .lib model):
  • Delay arcs (combinational): input pin to output pin, e.g. A to Y of a NAND. Characterized as a function of input transition (slew) and output load (capacitance), usually via NLDM/CCS tables. Can be positive/negative unate or non-unate.
  • Sequential arcs: CLK to Q (clock-to-output, T_cq).
  • Constraint arcs (checks): setup and hold on D relative to CLK, plus recovery/removal for async pins, and min-pulse-width.
Net arc (interconnect): delay from a driver pin to each receiver pin, derived from extracted RC (Elmore or higher-order). Net arcs model wire delay; cell arcs model gate delay. STA alternates cell and net arcs along a path.

Q4STA splits checks into the four classic path groups: in2reg, reg2reg, reg2out, in2out. Define each, and say which require I/O constraints.

Path groups classify paths by what the startpoint and endpoint are:
  • reg2reg (register-to-register): start = launch FF clock pin, end = capture FF/D. The internal core paths; bounded entirely by the clock period and constrained automatically.
  • in2reg (input-to-register): start = input port, end = FF/D. Needs set_input_delay to model how late data arrives at the port relative to the clock.
  • reg2out (register-to-output): start = FF clock pin, end = output port. Needs set_output_delay to model the external setup/hold the downstream block requires.
  • in2out (input-to-output): start = input port, end = output port; a purely combinational feedthrough. Needs both set_input_delay and set_output_delay.
So all three I/O groups need (virtual-)clock-referenced I/O constraints; only reg2reg is fully constrained by the clock definition alone. These groups let the tool report and budget each interface separately.
The four path groups relative to the block boundary and internal registers. — click to enlarge

Q5Explain arrival time and required time, and how slack is computed for a setup check and for a hold check. Be precise about signs.

Arrival time (AT) = the actual time the data signal reaches a pin, found by summing launch clock edge + clk-to-Q + combinational/net delays along the path.
Required time (RT) = the latest (setup) or earliest (hold) time data is allowed to arrive at the endpoint to meet the constraint, derived from the capture clock edge and the setup/hold requirement.
Slack = RT − AT for setup (data must arrive before required), so:
  • Setup slack = RT_setup − AT where RT_setup = T_capture_edge + T_capture_clk_latency − T_setup − T_uncertainty. Positive = met.
  • Hold slack = AT − RT_hold where RT_hold = T_capture_edge + T_capture_clk_latency + T_hold + T_uncertainty_hold. Data must arrive after the hold window, so here it's AT minus RT. Positive = met.
(Here AT already includes the launch-clock latency, so the capture-clock latency in RT captures the skew.) Key sign intuition: setup wants data fast enough (large AT hurts), hold wants data slow enough (small AT hurts). Setup is a max-delay (late) check; hold is a min-delay (early) check.
Setup bounds the latest arrival; hold bounds the earliest. Data must land outside the keep-out window. — click to enlarge

Q6Why is STA called 'exhaustive' and 'vectorless'? What does that buy you and what does it cost you?

Vectorless: STA does not need input stimulus. It works purely on the netlist topology, the timing arcs, and the constraints. It assigns worst-case transitions to every arc rather than evaluating logic values.
Exhaustive: because it is value-independent, STA traverses and checks every timing path in the design (within each mode/corner), guaranteeing it finds the true critical path in each group.
What it buys:
  • Complete timing coverage with no reliance on testbench quality.
  • Fast, deterministic, repeatable sign-off that scales to huge designs.
What it costs:
  • It is function-blind: it will time paths that can never be sensitized at the same time (false paths) or that legitimately take multiple cycles (multicycle), so they must be declared with set_false_path/set_multicycle_path or you get pessimism.
  • It cannot catch functional/logical bugs, glitches, or value-dependent behavior; that still needs simulation.

Q7What are the key limitations of STA, and what classes of problems can it NOT find?

STA is timing-only and assumption-bound. Its main limitations:
  • No functional verification: it checks timing, not logical correctness; you still need simulation/formal for that.
  • False/multicycle paths: being vectorless, it reports paths that are functionally impossible or legitimately multi-cycle unless you constrain them, causing pessimism or wasted effort.
  • Asynchronous / clock-domain crossings: STA can't validate metastability or synchronizer correctness; CDC needs dedicated tools.
  • Combinational loops: must be broken (loop-cut) for analysis.
  • Glitch / dynamic effects: STA assumes a single clean transition per arc; it doesn't model glitches, simultaneous switching noise beyond derate, or detailed signal integrity unless explicitly modeled (CRPR, SI delay/noise add-ons).
  • Model accuracy bound: results are only as good as the .lib characterization, RC extraction, constraints (SDC), and the chosen corners/derates. Bad constraints = confidently wrong sign-off.
  • Doesn't pick vectors: can't tell you which input pattern actually exercises the critical path.

Q8Write the basic single-cycle setup and hold inequalities for a reg2reg path, including clock skew and uncertainty, and explain why hold is independent of clock period.

Let launch and capture be on the same edge for hold, and one period apart for setup. With T_skew = T_capture_clk_arrival − T_launch_clk_arrival:
Setup (max-delay, data must settle before next capture edge):
T_cq + T_comb_max + T_setup ≤ T_clk + T_skew − T_uncertainty
Hold (min-delay, data must not race through within the same edge):
T_cq + T_comb_min ≥ T_hold + T_skew + T_uncertainty
Notice hold has no T_clk term: both launch and capture are referenced to the same clock edge, so the period cancels out. That's why a hold violation cannot be fixed by slowing the clock; you fix it by adding delay (buffers) on the data path or fixing skew. Setup, in contrast, scales with the period, so a slower clock relaxes setup. Also note positive skew (capture later) helps setup but hurts hold, and vice versa.
Setup spans one period; hold is referenced to the same edge, so T_clk cancels. — click to enlarge

Q9An interviewer says: 'Your reg2reg setup path has positive slack but the design still fails on silicon at speed.' Give plausible STA-related root causes.

Positive STA slack only means the path meets timing under the models and constraints you gave it. Common reasons silicon still fails:
  • Missing/optimistic corners: you didn't sign off the corner that's actually worst (e.g. a temperature-inversion 'hot vs cold' crossover, or an SI corner). Always close on the right MMMC set.
  • Wrong constraints: an over-aggressive set_false_path/set_multicycle_path hid the real path, or input/output delays didn't match the true environment.
  • Insufficient OCV/derate: on-chip variation under-modeled (no AOCV/POCV, or wrong derates), so the silicon path is slower than analysis.
  • SI not modeled: crosstalk delta-delay from aggressors pushed the path slower than the noise-free number.
  • Extraction/parasitic mismatch: wrong RC corner, missing coupling cap, or post-route vs estimated RC differences.
  • Clock issues: real skew/jitter worse than the uncertainty budget, or a duty-cycle/clock-path problem.
  • Library accuracy: NLDM vs CCS, slew clamp, or characterization not matching the real process.
The lesson: STA slack is a statement about the model, not a guarantee about the die.

Q10Define unateness for a timing arc (positive unate, negative unate, non-unate). Why does it matter for STA?

Unateness describes how an output transition direction relates to an input transition direction for a delay arc:
  • Positive unate: a rising input causes a rising output and a falling input a falling output (e.g. the arc through a buffer, AND, or OR). Rise tracks rise.
  • Negative unate: a rising input causes a falling output and vice versa (e.g. INV, NAND, NOR). Rise inverts to fall.
  • Non-unate: output direction is not determined by that input's direction alone, so both senses are possible (e.g. XOR/XNOR, and the clock pin of a flop).
Why it matters:
  • It tells STA which input edge to associate with which output edge when propagating rise vs fall arrival times and slews, which directly affects delay selection (rise and fall delays differ).
  • For clock paths, unateness determines whether STA treats a stage as inverting; an odd number of inverting (negative-unate) stages flips the effective edge (a positive edge at the source becomes a negative edge at the FF clock pin), which changes which edges launch/capture.
  • Non-unate clock-network elements force the tool to consider both edge senses, increasing analysis and sometimes pessimism.

Q11What is the difference between path-based and graph-based (block-based) analysis in STA, and when is each used?

Graph-based analysis (GBA): the tool propagates a single worst-case arrival (and worst slew) per pin through the timing graph, taking the max (setup) or min (hold) at each merge point. It is fast and pessimistic because the worst slew at a node may not belong to the same path that produced the worst arrival; that 'slew merging' pessimism is carried forward.
Path-based analysis (PBA): the tool re-times specific critical paths end-to-end using the actual slew propagated along that path, removing the cross-path pessimism GBA introduced. It is more accurate but slower, so it's used selectively.
Usage in practice:
  • Run GBA for full-chip iteration and optimization, accepting conservative numbers.
  • Run PBA (exhaustive or path-recovery mode) on the failing/near-critical endpoints during signoff to recover pessimism and avoid over-fixing paths that actually pass.
PBA can recover meaningful slack (often tens of ps) on deep paths, which can be the difference between closing timing and adding needless buffers.

Q12On a setup path, the capture clock arrives later than the launch clock (positive skew). Does that help or hurt setup, and what about hold? State it generally.

Define T_skew = T_capture_arrival − T_launch_arrival (positive = capture clock is later).
Setup: positive skew helps. The capture edge effectively moves later, giving the data more time:
T_cq + T_comb + T_setup ≤ T_clk + T_skew − T_uncertainty — a larger T_skew relaxes the bound.
Hold: positive skew hurts. The same-edge hold check tightens:
T_cq + T_comb_min ≥ T_hold + T_skew + T_uncertainty — a larger T_skew makes the requirement harder.
General statement: skew toward the capture flop (positive) borrows time for setup but eats into hold margin; skew toward the launch flop (negative) does the opposite. This is exactly why useful-skew optimization trades setup vs hold across stages, and why aggressive setup-driven skewing can create hold violations downstream. Uncertainty (jitter) always subtracts from setup margin and adds to the hold requirement, regardless of skew sign.
Positive skew shifts the capture edge later: more setup margin, less hold margin. — click to enlarge

2. Timing Paths & Path Types 12 questions

Q13What are the four canonical timing path groups in STA? Define each by its start point and end point.

STA always analyzes a path from a defined start point to a defined end point. Start points are input ports or clock pins of sequential cells; end points are output ports or data (D) pins of sequential cells. The four classic groups are:
  • reg-to-reg (FF to FF): launch flop clock pin to capture flop D pin. The dominant, most common group; bounded by both setup and hold.
  • in-to-reg (input to FF): primary input port to a capture flop D pin. Constrained by set_input_delay.
  • reg-to-out (FF to output): launch flop clock pin to primary output port. Constrained by set_output_delay.
  • in-to-out (input to output): primary input through pure combinational logic to a primary output. A feedthrough/combinational path constrained by both input and output delay.
Latch-based designs add latch-to-latch paths where time borrowing applies.
The four timing path groups by start/end point. — click to enlarge

Q14Differentiate the data path from the clock path in a reg-to-reg timing arc. Why is the clock path traversed twice in one setup check?

Data path: the logic the launched data travels through — from the launch flop's Q pin, through combinational logic, to the capture flop's D pin. Its delay is what we try to fit inside the clock period.
Clock path (clock network): from the clock source/port through the clock tree (buffers, gates) to the CK pin of each flop. It determines when each flop actually sees its edge.
In a single setup check the clock network is traversed twice, computing two separate insertion delays:
  • the launch clock path to the launch flop's CK pin, and
  • the capture clock path to the capture flop's CK pin.
The difference between these two is the clock skew. Treating them independently is essential — common on-chip variation (OCV/AOCV) and CPPR/CRPR correct for the shared (common) portion of the two clock paths so the skew is not double-penalized.

Q15Write the fundamental setup (max-delay) inequality for a reg-to-reg path and define every term.

Setup requires that data arrives at the capture D pin before the setup window of the capturing edge:
T_launch + T_cq + T_comb ≤ T_period + T_capture − T_setup − T_uncertainty
Rearranged into the textbook form (clean clocks, skew = capture − launch insertion):
T_cq + T_comb + T_setup ≤ T_clk + T_skew − T_uncertainty
  • T_cq: clock-to-Q delay of the launch flop.
  • T_comb: combinational data-path delay (the data path).
  • T_setup: setup time of the capture flop (data must be stable before the edge).
  • T_clk: clock period.
  • T_skew: capture insertion delay − launch insertion delay. Positive skew helps setup (capture edge arrives later).
  • T_uncertainty: jitter + margin (pre-CTS also models skew). It is subtracted for setup.
Setup uses late data arrival vs early required time; data-path cells use the max/slow library corner.

Q16Write the hold (min-delay) inequality and explain why hold violations cannot be fixed by slowing the clock.

Hold requires that newly launched data does NOT arrive at the capture D pin too soon — it must stay stable for the hold window after the same capturing edge that latched the previous data:
T_cq + T_comb ≥ T_hold + T_skew + T_uncertainty
i.e. the fastest data path must be slower than the hold requirement at the capture flop. Equivalently T_launch + T_cq + T_comb ≥ T_capture + T_hold for the same clock edge (no period term).
  • Because hold is a same-edge check, the clock period T_clk cancels out entirely. Slowing the clock does nothing for hold.
  • Hold uses the min/fast data corner (shortest delay) and is the worst case when the capture clock arrives late relative to launch (positive skew hurts hold).
  • Fix hold by adding delay in the data path (buffers/delay cells) or rebalancing clock skew.
Hold is checked against the same clock edge — data must stay stable through the hold window. — click to enlarge

Q17Define launch and capture flip-flops. For a setup check on a single-cycle path, which clock edges launch and capture, and what changes for hold?

Launch flop: the source register whose CK edge sends data into the data path (data leaves on its Q after T_cq). Capture flop: the destination register whose CK edge samples the data at its D pin (governed by setup/hold).
  • Setup, single-cycle: data is launched by edge at time 0 (launch edge) and must be captured by the next edge at time T_clk (capture edge). The available time is one full period.
  • Hold: uses the same edge for launch and capture — the data launched at edge n must not corrupt the data being captured at edge n. Available time is essentially zero, hence the min-delay requirement.
For multicycle paths the setup capture edge is moved out by N periods (set_multicycle_path N), and the hold edge is conventionally moved with it (default hold = setup_mcm − 1) unless explicitly re-specified.
Single-cycle setup: launch at t=0, capture one period later. — click to enlarge

Q18Walk through computing the data arrival time and the data required time for a setup check, then define slack.

Data arrival time at the capture D pin (using late/max numbers):
arrival = T_launch_clk + T_cq + T_comb
where T_launch_clk is the launch-clock insertion delay to the launch CK pin.
Data required time at the same D pin for setup:
required = T_clk + T_capture_clk − T_setup − T_uncertainty
where T_capture_clk is the capture-clock insertion delay.
Setup slack:
slack = required − arrival
Positive slack = timing met; negative = violation. For hold, the sense flips: slack = arrival − required, using early/min data and the same-edge required time required = T_capture_clk + T_hold + T_uncertainty. The reported critical path is simply the path with the most negative (worst) slack in its group.

Q19What exactly is the 'critical path'? Is the longest combinational path always the critical path? Justify.

The critical path is the timing path with the worst (most negative, or smallest positive) slack in the design — it limits the maximum operating frequency for setup analysis. It is defined by slack, not by raw delay.
  • No, the longest combinational path is not necessarily critical. Slack depends on the full equation: clock period, both clock insertion delays (skew), setup time, uncertainty, and the data delay.
  • A shorter data path on a faster (tighter-period) clock domain, or one with negative skew (capture clock arriving early), or one ending at a flop with a large setup time, can have worse slack than a physically longer path.
  • Different path groups / clock domains can have different periods, so 'longest delay' across groups is meaningless without normalizing by slack.
Designers also track WNS (worst negative slack — the single critical path) and TNS (total negative slack — sum over all violating endpoints) to gauge overall closure effort.

Q20Why does positive clock skew help setup but hurt hold? Give the sign convention and the intuition.

Define skew = capture insertion delay − launch insertion delay. Positive skew means the capture clock edge arrives later than the launch edge.
  • Setup: a later capture edge gives the data path more time to propagate. In T_cq + T_comb + T_setup ≤ T_clk + T_skew, positive skew adds to the RHS budget. So positive skew helps setup.
  • Hold: a later capture edge means the capture flop samples later, giving the next launched data more chance to race through and corrupt the value. In T_cq + T_comb ≥ T_hold + T_skew, positive skew increases the RHS requirement. So positive skew hurts hold.
This is the classic skew tradeoff: deliberate useful skew can buy setup margin on a critical path, but it must be paid back as hold margin (extra data-path delay) on the same and neighboring paths.
Positive skew: capture edge lags launch — adds setup budget, erodes hold margin. — click to enlarge

Q21Given T_cq = 0.15 ns, T_comb = 0.80 ns, T_setup = 0.10 ns, T_clk = 1.00 ns, capture insertion = 0.30 ns, launch insertion = 0.25 ns, clock uncertainty = 0.05 ns. Compute the setup slack. Then state the max frequency if skew and uncertainty were zero.

Skew = capture − launch insertion = 0.30 − 0.25 = +0.05 ns.
Data arrival (relative, using launch insertion) = launch_ins + T_cq + T_comb = 0.25 + 0.15 + 0.80 = 1.20 ns.
Data required = T_clk + capture_ins − T_setup − uncertainty = 1.00 + 0.30 − 0.10 − 0.05 = 1.15 ns.
Setup slack = required − arrival = 1.15 − 1.20 = −0.05 ns (VIOLATION).

Sanity check with the compact form: slack = T_clk + skew − uncertainty − (T_cq + T_comb + T_setup) = 1.00 + 0.05 − 0.05 − 1.05 = −0.05 ns. Consistent.

With skew = 0 and uncertainty = 0, the minimum period is T_cq + T_comb + T_setup = 0.15 + 0.80 + 0.10 = 1.05 ns, so f_max = 1 / 1.05 ns ≈ 952 MHz.

Q22On a pure input-to-output (combinational feedthrough) path with no flops, how is timing constrained, and what is the arrival/required computation?

There is no launch or capture flop, so the path is bounded by the I/O delay budget defined relative to a virtual (or real) clock:
  • Start: set_input_delay models the external delay from the upstream (off-chip/block) launch flop to the input port.
  • End: set_output_delay models the external setup requirement of the downstream capture flop seen at the output port.
Setup at the output port:
arrival = T_input_delay + T_comb
required = T_clk − T_output_delay (− uncertainty)
slack = required − arrival
So the on-chip combinational budget is effectively T_clk − T_input_delay − T_output_delay. If no I/O delays are set, the port is unconstrained and STA may report it as no path / unconstrained endpoint — a common interview gotcha. A min-delay (hold) check uses the early input delay and min output delay analogously.

Q23Draw and explain the full reg-to-reg path schematic and annotate where clock-to-Q, combinational delay, and setup come from physically.

Data flows: launch flop CK → (after T_cq) launch Q → combinational cloud (T_comb) → capture flop D, sampled by the capture CK edge subject to T_setup.
  • T_cq (clock-to-Q): the launch flop's internal propagation from its CK edge to a valid Q — a property of the launch flop and its output load/slew.
  • T_comb: the sum of cell + net delays through the data-path logic (the data path), evaluated at the slow corner for setup, fast corner for hold.
  • T_setup: a constraint of the capture flop — how long D must be stable before the capture CK edge.
The two CK pins are driven by separate branches of the clock tree, so their insertion delays differ — that difference is the skew applied to the check. The setup budget is T_clk + T_skew − uncertainty ≥ T_cq + T_comb + T_setup.
Reg-to-reg path: T_cq at launch Q, T_comb through the logic cloud, T_setup at capture D. — click to enlarge

Q24Senior-level: in latch-based timing, how do time borrowing and the concept of paths-through-latches change the basic setup equation, and what is the risk?

Latches are level-sensitive, so data can pass through while the clock is in its transparent phase. This enables time borrowing (a.k.a. cycle stealing): if data arrives after the latch's opening edge but while the latch is still transparent, it propagates through immediately, effectively letting a slow stage borrow time from the next stage.
  • The setup check moves from 'arrive before the opening edge' to 'arrive before the latch closes', adding the transparency window to the budget: borrow up to (pulse width − T_setup).
  • Borrowing chains across multiple latch stages, so STA must analyze paths through latches, not just simple reg-to-reg arcs, and the borrowed amount propagates as an adjusted required time downstream.
Risks: borrowing can cascade and mask a real frequency problem; excessive borrow at one latch tightens the next stage and can worsen hold (data is live during the transparent phase, raising race risk). Tools cap borrow at the available window and flag when the borrow demand exceeds it.
Latch time borrowing: data arriving in the transparent phase borrows from the next stage. — click to enlarge

3. Setup & Hold Checks 12 questions

Q25Define setup time and hold time for a flip-flop. What physically goes wrong if each is violated?

Setup time (T_setup): the minimum interval before the active clock edge during which the data input D must be stable.
Hold time (T_hold): the minimum interval after the active clock edge during which D must remain stable.
Together they define a forbidden window around the edge in which data may not transition.
  • Setup violation: data arrives too late, so the new value misses the capturing edge. The flop may capture the old value, the wrong value, or go metastable. A setup failure means the path is too slow for the clock period.
  • Hold violation: data changes too soon after the edge (it races through combinational logic and corrupts the value being captured). The flop captures a value that should have been latched on the next cycle. A hold failure is a logic-correctness problem independent of how slow you run the clock.
Setup/hold window around the capturing clock edge; D must be stable inside it. — click to enlarge

Q26Write the setup timing equation for a single-cycle flop-to-flop path and explain every term, including the sign of skew.

Setup constraint: T_clk ≥ T_cq + T_comb + T_setup − T_skew
Equivalently, data arrival must beat data required time: T_launch + T_cq + T_comb ≤ T_capture + T_clk + T_skew − T_setup.
  • T_clk — clock period (the time budget for one cycle).
  • T_cq — clock-to-Q of the launch flop (propagation delay from launch edge to data out).
  • T_comb — combinational delay through the data path between the two flops.
  • T_setup — setup requirement of the capture flop; it eats into the budget, hence it is subtracted.
  • T_skew = T_capture_clk − T_launch_clk — if the capture clock arrives later than the launch clock, skew is positive and it helps setup (more time), so it is subtracted on the right / added to the budget.
In practice the right side also subtracts clock uncertainty (jitter + margin) and applies OCV/derate, but the core relation is the one above.

Q27Write the hold timing equation and explain why hold checks are independent of clock frequency.

Hold constraint: T_cq + T_comb ≥ T_hold + T_skew
Read as: the fastest data arrival at the capture flop must still be later than its hold requirement. The new data launched by an edge must not arrive at the capture flop so fast that it violates the hold window of that same edge.
  • Both sides reference the same active clock edge — the launch and the capture-flop hold check happen on edge N, not edge N and edge N+1. The clock period T_clk never appears.
  • Because T_clk cancels out, hold is purely a relationship between data-path delay, hold requirement, and skew — all of which are fixed by the silicon and layout, not by how fast you clock.
  • Consequence: you cannot fix a hold violation by changing frequency. Slowing the clock does nothing; speeding it up does nothing. Hold must be fixed structurally (buffers/delay on the data path or skew adjustment).
Here T_skew is the same definition (capture − launch); positive skew (capture later) hurts hold because it gives the racing data more time to corrupt the capture.

Q28A path has T_cq = 120 ps, T_comb = 700 ps, T_setup = 80 ps, and the clock period is 1 ns with zero skew. What is the setup slack? Now the same path has T_hold = 90 ps and fast-corner T_cq = 60 ps, T_comb = 50 ps — is hold met?

Setup slack = required − arrival = (T_clk + T_skew − T_setup) − (T_cq + T_comb).
= (1000 + 0 − 80) − (120 + 700) = 920 − 820 = +100 pssetup MET with 100 ps margin.
Hold slack = arrival − required = (T_cq + T_comb) − (T_hold + T_skew), evaluated at the fast (min) corner.
= (60 + 50) − (90 + 0) = 110 − 90 = +20 pshold MET with 20 ps margin.
  • Key analysis subtlety: setup is checked with max (slow) delays, hold with min (fast) delays — that is why T_cq/T_comb differ between the two checks even on the same physical path.

Q29Explain metastability. How does it relate to setup/hold, and how is MTBF affected by it?

Metastability is the condition where a flop's output hovers at an indeterminate voltage (between valid 0 and 1) for an unbounded time after the clock edge, eventually resolving randomly to 0 or 1. It happens when D changes inside the setup/hold window, so the internal cross-coupled latch is driven near its unstable balance point.
  • Violating setup or hold does not guarantee a wrong value — it creates a probability of metastability and a resolution time that can exceed the available slack.
  • The output may oscillate or settle late, and a downstream flop can then sample a half-resolved value, propagating the failure.
  • MTBF (mean time between failures) for a synchronizer: MTBF = e^(T_r / τ) / (T_w · f_clk · f_data), where T_r is the resolution time available, τ is the flop's resolution time constant, T_w the metastability window, f_clk the sampling clock, f_data the data toggle rate.
  • Mitigation: multi-flop synchronizers (2 or 3 FFs) for asynchronous crossings — each extra stage gives another full cycle of resolution time, raising MTBF exponentially. Synchronizers do not eliminate metastability; they make it astronomically improbable.

Q30List the techniques to fix a setup violation, and separately the techniques to fix a hold violation. Why can fixing one create the other?

Setup fixes (need the data faster, or more time):
  • Reduce T_comb: upsize cells, restructure/buffer long nets, reduce logic depth (logic restructuring, retiming).
  • Reduce load/fanout, fix poor placement of the path.
  • Add useful skew: make the capture clock arrive later (positive skew) to borrow time.
  • Relax the period (lower frequency) — last resort, since setup is frequency dependent.
Hold fixes (need the data slower):
  • Insert delay/hold-buffer cells on the short data path (the standard fix).
  • Downsize fast cells or use higher-Vt (slower) cells on the path.
  • Reduce positive clock skew at the capture flop, or balance the clock tree.
Why they conflict: adding delay to fix hold slows the path and erodes setup margin; speeding the path or adding positive skew to fix setup makes data race faster / capture later and erodes hold. Useful skew especially trades setup margin on one stage for hold margin loss on the next — fixes must respect both windows simultaneously.

Q31Where do clock uncertainty, jitter, and OCV derate apply — to setup, to hold, or both? Be precise about the sign.

Clock uncertainty models jitter + margin and is subtracted from the available time so it always makes the check harder:
  • Setup: uncertainty reduces the effective period → T_clk − T_uncertainty on the required side. Both jitter and setup margin apply.
  • Hold: typically only the margin component of uncertainty applies (often a smaller hold-uncertainty), and it adds to the required hold → harder. Cycle-to-cycle jitter largely cancels for hold because launch and capture use the same edge, so many flows set hold uncertainty lower than setup uncertainty.
OCV / derate (on-chip variation): models that launch and capture paths see different process/voltage conditions.
  • Setup (max analysis): late-derate the data/launch-clock path (slower) and early-derate the capture-clock path (faster) — worst case for catching the edge.
  • Hold (min analysis): early-derate the data/launch path (faster) and late-derate the capture-clock path (slower) — worst case for the race.
Advanced flows replace flat derate with AOCV/POCV (depth- and distance-dependent, or statistical) to reduce pessimism. The common clock path (up to the divergence point) is derated consistently — CRPR credits back the over-pessimism on the shared portion.

Q32Walk me through a complete setup/hold waveform for a launch flop, a combinational path, and a capture flop in the same cycle. Where exactly are the two checks made?

Sequence:
  • At launch edge (cycle N), after T_cq the launch Q updates and the new value propagates through T_comb, arriving at the capture flop's D as D_arrival.
  • Setup check is made against the next capture edge (N+1): D_arrival must settle at least T_setup before that edge. Required = T_clk + T_skew − T_setup.
  • Hold check is made against the same edge (N): the new data must not arrive earlier than T_hold after that edge, i.e. it must not corrupt what edge N is trying to capture. Required = T_hold + T_skew.
So setup and hold checks for a single-cycle path are made against two different edges separated by one period — that is precisely why setup involves T_clk and hold does not.
Single-cycle path: hold checked against edge N, setup against edge N+1. — click to enlarge

Q33How does clock skew affect setup and hold? Explain useful (intentional) skew and its limit.

Define T_skew = T_capture_clk − T_launch_clk (positive = capture clock arrives later).
  • Setup: positive skew helps — the capture edge moves later, giving the data more time: T_clk ≥ T_cq + T_comb + T_setup − T_skew.
  • Hold: positive skew hurts — the racing data has more time to reach the capture flop before its (delayed) hold window closes: T_cq + T_comb ≥ T_hold + T_skew.
Useful (clock) skew is intentionally adjusting clock arrival to borrow time across a pipeline: delay the capture clock of a critical stage to fix its setup, which simultaneously delays the launch of the next stage.
  • Limit: the borrowed time is taken from the neighboring stage, and the added positive skew degrades hold on the borrowing stage. So skew is bounded by the next stage's setup margin and the current stage's hold margin — it redistributes slack, it does not create it.
Positive skew delays the capture edge: more setup budget, less hold budget. — click to enlarge

Q34A block passes setup at signoff but fails hold on silicon. The clock was already slowed in test. What is your debug approach?

Slowing the clock confirms it is a hold problem (hold is frequency independent — slowing did not help), which is consistent with the symptom.
  • Confirm the corner: hold fails at the fast corner (low temp / high V / fast process for most nodes, though temperature inversion can make hot the worst hold corner at advanced nodes). Re-check that signoff covered the true min-delay corner, including the right RC (min) extraction.
  • Check skew / CTS: excess positive skew at the capture flop is a classic cause. Look for clock-tree imbalance, missing CRPR credit, or a divergent common path that left real skew.
  • Check derate/OCV: were min-delay derates and POCV applied on the data path and late-derate on capture clock? Optimistic derate hides hold fails.
  • Check for missed/false hold arcs: clock-gating, multicycle, or async paths mis-constrained; missing hold buffers that were dropped in ECO or eaten by routing.
  • Check IR drop / noise: dynamic voltage rise or crosstalk can speed the data path beyond the static min model, opening a real hold race not seen in nominal STA.
  • Fix: insert delay cells on the offending min paths via ECO and re-run min-corner timing across all PVT + OCV; verify no setup regression.

Q35Two flops are driven by the same clock but the data path between them is a single inverter (very short). Setup is fine. Why is this the riskiest hold scenario, and what is the worst-case condition?

A near-zero-delay data path is the canonical hold danger: the new data launched on edge N reaches the capture flop almost immediately, easily arriving within the capture flop's hold window of that same edge N.
  • Hold margin = (T_cq + T_comb) − (T_hold + T_skew). With tiny T_comb, the left side is small, so any positive skew or fast corner pushes it negative.
  • Worst case: fast/min corner (smallest T_cq and T_comb) plus positive skew at the capture flop (capture clock later than launch). This combination minimizes data arrival relative to the hold requirement.
  • Because T_clk is absent, you cannot test your way out of it on the bench by changing frequency — it is a structural race.
Fix: insert hold/delay buffers on the data path so T_cq + T_comb exceeds T_hold + T_skew with margin, and balance the clock tree to remove the harmful positive skew.
Short data path with positive skew at fast corner: the classic hold race. — click to enlarge

Q36What is a setup/hold borrow (time borrowing) and why does it apply to latches but not edge-triggered flops?

Time borrowing is when a level-sensitive latch lets data that arrives late (while the latch is transparent) pass through and propagate, effectively borrowing time from the next pipeline stage.
  • A latch is transparent for the full active phase of the clock, so data arriving after the opening edge but before the closing edge still flows through. The slack 'borrowed' is bounded by the transparency window (roughly the active phase minus the setup of the closing edge).
  • Edge-triggered flops sample only at the edge (an instant), so there is no transparency window — data must meet setup before that edge or it is lost. No borrowing is possible.
  • Latch-based (time-borrowing) design averages slack across stages and tolerates imbalance, which is why high-performance CPUs use it — at the cost of much harder STA (level-sensitive checks, borrow limits, more complex hold analysis since the latch can be transparent when new data arrives).

4. Clocks: Skew, Jitter, Latency, Uncertainty 12 questions

Q37Define clock period and frequency, and write the relationship between them. If a design must run at 1.25 GHz, what is the clock period in ps?

Clock period T_clk is the time for one full cycle of the clock; frequency f is the number of cycles per second. They are reciprocals: f = 1 / T_clk.

For a 1.25 GHz target: T_clk = 1 / (1.25e9) = 0.8 ns = 800 ps.

Why it matters in STA: the period is the budget the setup check must close into. The fundamental single-cycle setup constraint is T_clk ≥ T_cq + T_comb + T_setup - T_skew + T_uncertainty, so improving frequency means shrinking T_comb or buying margin from T_skew/borrow. Hold has no T_clk term — it is a same-edge check, which is why hold violations cannot be fixed by slowing the clock.

Q38Distinguish source latency from network latency. Which one models the off-chip path, and how does each affect setup and hold timing?

Source latency (a.k.a. insertion delay outside the clock-tree boundary) is the delay from the clock's true origin — typically the board oscillator / PLL output — to the clock definition point (the port or pin where create_clock is applied). It models the off-chip / pre-port path.

Network latency is the delay from the clock definition point through the on-chip clock network (clock tree) to the register clock pins.

  • Before CTS (ideal clocks), network latency is estimated via set_clock_latency and source latency via set_clock_latency -source.
  • After CTS (propagated clocks), network latency becomes the real, per-endpoint tree delay; source latency usually stays as a fixed model.
Key point: when source latency is applied equally to launch and capture clocks of the same clock, it cancels out in single-clock setup/hold checks and does not change slack. It only matters for cross-clock checks or when launch and capture clocks have different source latencies.

Q39What is clock skew? Define positive and negative skew precisely with respect to the launch and capture flops, and state the sign convention you are using.

Clock skew is the difference in clock arrival time between two related registers. Define it as T_skew = T_capture - T_launch (capture-clock arrival minus launch-clock arrival).

  • Positive skew: T_skew > 0 — the capture clock arrives later than the launch clock. The capture flop is given extra time, so positive skew helps setup and hurts hold.
  • Negative skew: T_skew < 0 — the capture clock arrives earlier than the launch clock. This hurts setup and helps hold.
In the equations:
Setup: T_cq + T_comb + T_setup ≤ T_clk + T_skew
Hold: T_cq + T_comb ≥ T_hold + T_skew
Always state the convention first — many interviewers use launch - capture, which flips the signs. The physics (later capture helps setup) never changes; only the label does.
Positive skew: capture clock arrives later than launch (T_skew = T_capture - T_launch > 0). — click to enlarge

Q40What is useful (intentional) skew? Give the setup and hold inequalities for a launch-to-capture path and explain how a tool 'borrows' time across stages with useful skew.

Useful skew is intentionally engineering the clock arrival times (rather than minimizing skew to zero) to balance slack between adjacent pipeline stages. By delaying the capture clock of a critical stage, you lend it time stolen from the next stage.

For a path with positive skew T_skew = T_capture - T_launch:
Setup: T_cq + T_comb + T_setup ≤ T_clk + T_skew — extra T_skew relaxes the slow stage.
Hold: T_cq + T_comb ≥ T_hold + T_skew — but the same skew tightens hold.

The borrow: delaying flop B's clock helps the A→B path but the same delay is now part of B→C's launch, stealing from the B→C budget. So useful skew is zero-sum across the cycle — you move slack from a stage that has margin to one that does not.

  • Only works when the donor stage has positive setup slack to give.
  • Always recheck hold on the helped path and setup on the donor path.
  • It is a real, committed clock-tree delay (unlike time borrowing in latches, which is a same-cycle phase effect).

Q41What is clock jitter? Distinguish period jitter from cycle-to-cycle jitter, and explain how jitter is modeled in STA.

Jitter is the short-term, random variation of clock edges from their ideal positions, caused mainly by the PLL, supply noise, and the clock source. Unlike skew (spatial, deterministic, between two points), jitter is temporal and statistical at a single point.

  • Period jitter: deviation of any single period from the ideal period — relevant to single-cycle setup margin.
  • Cycle-to-cycle jitter: difference between two consecutive periods — relevant to back-to-back-edge effects.
In STA jitter is not simulated edge-by-edge; it is folded into the clock uncertainty applied to the capture edge via set_clock_uncertainty. For setup, jitter effectively shortens the available period (the capture edge may come early), so it is subtracted from the setup side. For hold, since it is a same-edge check on a single clock, ideal-clock jitter largely common-modes out and is typically excluded or set much smaller — that is why setup and hold uncertainty are specified separately.

Q42How does clock uncertainty enter the setup check versus the hold check? Why are the two values usually different and why does setup uncertainty shrink the period while hold uncertainty does not?

Clock uncertainty (set_clock_uncertainty) is a lumped margin covering jitter, estimated skew (pre-CTS), and guardband. STA applies it pessimistically: it always makes the relevant check harder.

Setup — the capture edge is assumed to arrive early by the uncertainty, shrinking the usable period:
T_cq + T_comb + T_setup ≤ T_clk - U_setup + T_skew

Hold — hold is a same-edge check (launch and capture on the same edge), so there is no T_clk to shrink. Instead the uncertainty widens the required separation; it is added to the hold requirement:
T_cq + T_comb ≥ T_hold + U_hold + T_skew

Why different values: setup uncertainty includes jitter (which accumulates over a full period and pushes edges in either direction), so it is larger. Hold happens on the same edge, so jitter is highly common-mode and cancels; hold uncertainty captures only residual skew/OCV, so it is smaller (often near zero pre-CTS). Using one value for both would either over-pessimize hold or under-protect setup.
Setup uncertainty pulls the effective capture edge earlier, shrinking the usable period. — click to enlarge

Q43A flop has T_cq = 80 ps, combinational delay 600 ps, capture-flop setup 50 ps, hold 40 ps, clock period 800 ps, setup uncertainty 60 ps, hold uncertainty 20 ps, and clock skew T_skew = +30 ps (capture later). Compute setup and hold slack. Does the path pass?

Using T_skew = T_capture - T_launch = +30 ps.

Setup slack = required - arrival = (T_clk + T_skew - U_setup - T_setup) - (T_cq + T_comb)
= (800 + 30 - 60 - 50) - (80 + 600)
= 720 - 680 = +40 pssetup PASSES with 40 ps margin.

Hold slack = arrival - required = (T_cq + T_comb) - (T_hold + U_hold + T_skew)
= (80 + 600) - (40 + 20 + 30)
= 680 - 90 = +590 pshold PASSES easily.

Insight: the positive skew added 30 ps to setup but cost 30 ps on hold. Here both pass, but if the hold path were a short bypass (small T_comb), that +30 ps skew is exactly what creates hold violations — which is why CTS-induced positive skew often shows up as new hold fails.

Q44Explain a generated clock and specifically a divide-by-2 clock. How do you constrain it in SDC, and what happens to the timing relationship between the source and divided clocks?

A generated clock is a clock derived on-chip from an existing (master) clock — by a divider, multiplier, MUX, or gating cell — rather than defined at a port. You constrain it with create_generated_clock, referencing the master via -source so STA keeps their phase relationship.

Divide-by-2 example on a flop output that toggles every master edge:
create_generated_clock -name clk_div2 -source [get_pins pll/CLK] -divide_by 2 [get_pins div_ff/Q]

  • The divided clock has 2× the period (half the frequency) of the master.
  • STA derives the launch/capture edges of clk_div2 from the master's edges, so the source → divider insertion delay is automatically inherited — you do not redefine latency from scratch.
  • Paths between the master clock domain and clk_div2 become multicycle-like related checks: because edges align only at multiples of the master period, the tool expands edges over the common period (LCM of the two periods) and picks the tightest setup and hold edge pair.
Forgetting -source (or using create_clock on the Q pin) breaks the phase relationship and the inherited insertion delay, giving wrong cross-domain slack.

Q45What is a gated clock and why is it used? From an STA standpoint, what must be checked on the clock-gating cell itself?

A gated clock is a clock whose toggling is conditionally disabled by an enable signal, almost always via an integrated clock-gating cell (ICG) — typically a latch-based AND/OR gate that suppresses the clock when idle. The purpose is dynamic power reduction: shutting off the clock to idle registers eliminates their clock-tree and flop switching power, and it replaces enable-recirculation MUXes (which keep the clock free-running) with true clock disable.

STA checks on the ICG:
  • Setup/hold of the enable relative to the clock at the gating cell — the enable must be stable around the gating point so the clock is never chopped into a glitch or runt pulse. A latch-based ICG samples the enable on the clock's low phase so the gate switches only while the clock is low.
  • Clock-gating checks (set_clock_gating_check / inferred) enforce this enable-vs-clock relationship.
  • The gating cell is part of the clock network, so its insertion delay and OCV derate count toward downstream skew; CTS must balance through it.
Missing the enable timing is a classic source of functional glitches that pure functional sim can hide but STA must catch.

Q46After clock tree synthesis, you see setup slack improve on many paths but a wave of new hold violations on short paths. Explain the mechanism in terms of skew, and how it is fixed.

Mechanism: CTS replaces the ideal (zero-delay) clock with a real tree. The tree introduces real skew between launch and capture flops. Where the tree happens to deliver the capture clock later than the launch clock (positive skew), setup improves — but the same positive skew directly attacks hold:
hold slack = T_cq + T_comb - T_hold - U_hold - T_skew
On a short path (small T_comb, e.g. a flop-to-adjacent-flop or shift register), there is little data delay to absorb that skew, so the skew term drives slack negative.

Why short paths specifically: hold is a same-edge race between clock skew and data propagation. Long paths have plenty of T_comb margin; short paths do not.

Fix:
  • Hold buffers / delay cells inserted on the data path to raise T_comb — the standard fix, since it does not touch setup-critical timing.
  • Reduce the offending skew by rebalancing the clock tree at those leaves.
  • Apply useful-skew adjustments cautiously, re-checking both corners.
Always fix hold at the fast/hold corner and re-verify setup at the slow corner afterward.

Q47In OCV / AOCV analysis, derate is applied to launch and capture clock paths. For a setup check, which clock path is slowed and which is sped up, and why does this make the analysis pessimistic?

On-chip variation (OCV) models the fact that the launch and capture clock paths see different local process/voltage/temperature, so their delays differ even for the same nominal clock. STA applies derate in the pessimistic direction for each check.

Setup check (want capture as early, launch+data as late as possible):
  • Launch clock path: slowed (late derate, >1.0) — pushes the data launch later.
  • Capture clock path: sped up (early derate, <1.0) — pulls the capture edge earlier.
  • Data path: slowed (max delay).
Hold check (the mirror image):
  • Launch clock path: sped up; capture clock path: slowed; data path: min delay.
This maximizes effective negative skew for each check, so slack is computed against the worst credible variation — hence pessimistic. Because launch and capture share common clock-tree segments up to the divergence point, that shared delay should not be double-derated; CRPR (clock reconvergence pessimism removal) credits it back. AOCV refines OCV by scaling derate with path depth and distance, removing some of that pessimism more accurately than flat OCV.

Q48Walk through what changes between an ideal clock and a propagated clock in STA, and which quantities (latency, skew, uncertainty) you'd expect to adjust at each stage of the flow.

Ideal clock (pre-CTS): the clock network is assumed to have zero delay; there is no physical tree yet. You model what the tree will do using estimates:
  • set_clock_latency — estimated network insertion delay; -source for off-chip latency.
  • set_clock_uncertainty — large, because it must cover estimated skew + jitter + margin (real skew is unknown).
Propagated clock (post-CTS): set_propagated_clock makes STA compute real per-pin clock arrival through the built tree.
  • Latency is now the actual tree delay, no longer an estimate.
  • Skew becomes real and measured (capture - launch per path).
  • Uncertainty is reduced — the skew portion is now real, so uncertainty should mostly carry just jitter + a small guardband; OCV/AOCV derate now models clock-path variation explicitly.
Net: pre-CTS you budget pessimistically with big uncertainty and estimated latency; post-CTS you replace estimates with measured tree behavior and shrink uncertainty so you don't double-count skew that is now physically accounted for.

5. Slack, Slew & Transition 12 questions

Q49Define slack. How is it computed for a setup (max) check versus a hold (min) check, and what does the sign mean?

Slack is the margin by which a timing check passes or fails. It always equals Required Arrival Time - Actual Arrival Time, but the sense of "required" flips between setup and hold.

Setup (max-delay) check: Slack = Required - Arrival
where Required = T_capture_clk_edge - T_setup + T_uncertainty_adjustment at the capture flop's D pin. Setup is checked against the next clock edge, so a slow (large) data arrival hurts it.

Hold (min-delay) check: Slack = Arrival - Required
where Required = T_capture_clk_edge + T_hold against the same (current) edge. A fast (small) data arrival hurts it.

Sign convention:
  • Positive slack = check met, margin to spare.
  • Zero = exactly on the edge (critical).
  • Negative slack = violation.
Note the operands swap so that in both cases positive means good.
Slack = Required - Arrival for setup; operands swap for hold so positive always means met. — click to enlarge

Q50What are WNS and TNS, and why do we report both? How do they differ from a simple slack number?

WNS (Worst Negative Slack) is the single most negative slack across all endpoints in a corner/mode. It tells you the depth of the worst violation and is what bounds the achievable clock period.

TNS (Total Negative Slack) is the sum of all negative slacks (positive-slack endpoints contribute 0): TNS = Σ min(slack_i, 0). It tells you the breadth of the problem.

Why both:
  • WNS = -50ps with TNS = -50ps means one failing path: a targeted fix (resize, buffer, useful skew).
  • WNS = -50ps with TNS = -5000ps means hundreds of failing endpoints: a systemic issue (clock period too tight, congestion, bad constraints).
A single endpoint slack is local; WNS/TNS aggregate the whole design and are the standard convergence metrics PD engineers track per ECO. By convention both are 0 when the design is clean (no negative endpoints).

Tools also report FEP (failing endpoint count) alongside, which further distinguishes depth from breadth.

Q51Walk me through a setup report_timing path: what are the columns, and how does arrival accumulate into a final slack?

A report_timing path has a data (launch) path and a clock (capture) path, with arrival accumulating cell+net delays.

Data arrival builds up as:
  • Clock network delay to the launch flop CK
  • + T_cq (clk-to-Q of launch flop)
  • + each combinational cell delay and net delay along the path
  • = Data Arrival Time at the capture D pin
Data required builds up as:
  • Clock network delay to capture flop CK + T_clk_period (the next edge)
  • - library T_setup
  • - clock uncertainty (jitter + margin)
  • +/- CRPR credit for the common clock segment
  • = Data Required Time
Slack = Required - Arrival. The report shows incremental and cumulative columns so you can see exactly which stage dominates delay. The key insight: launch and capture share a common clock root, so only the divergent portion of skew is real.
Arrival accumulates Tcq + comb; required = period - setup - uncertainty; slack is the gap. — click to enlarge

Q52Distinguish slew from transition. What does transition time physically measure, and what thresholds are used?

Slew and transition time are used interchangeably in STA: both mean the time a signal takes to switch between two voltage thresholds.

Physically it is the time for the node to charge/discharge through the driver's output resistance into the load capacitance, i.e. roughly t_trans ≈ k · R_drive · C_load. A weak driver or heavy load gives a slow (large) slew.

Thresholds: measured between library-defined points, commonly 10%-90% or, more often in modern libs, 20%-80% or 30%-70% of VDD. The library's slew_lower_threshold_pct / slew_upper_threshold_pct define them, and the slew_derate_from_library attribute scales between the measured (e.g. 20%-80%) value and the full-swing equivalent the tool uses internally.

Key points:
  • Rise and fall slews can differ (asymmetric drive strength).
  • Input slew is an input to delay calculation; output slew is an output that propagates to the next stage.
  • set_max_transition is a design rule constraint (DRC), independent of slack but a precondition for valid delay calc.
Transition time = interval between library voltage thresholds (e.g. 20%-80%) during switching. — click to enlarge

Q53How does input slew affect a cell's delay? Why does a slower input slew not just add a fixed delay?

Input slew is one of the two axes of every NLDM delay table, so delay is a nonlinear function of it, not an additive offset.

Mechanism: a slower input transition means the input crosses the gate's switching threshold later and the transistor turns on more gradually, so both the cell delay and the output slew increase. Output slew then feeds the next stage's delay lookup, so a degraded slew propagates and compounds down the path.

Why not fixed:
  • Delay is measured from the input threshold crossing to the output threshold crossing; a slower input shifts the input crossing AND changes the device operating region, so the increment depends on the current slew/load operating point.
  • The relationship is captured as a 2D surface delay = f(input_slew, output_load) with interpolation between table points.
Practical consequence: a high-fanout or long net with no buffering degrades slew, which inflates downstream delays. Fixing slew (buffer insertion, upsizing) often recovers more slack than it appears to on the local stage alone. This is why set_max_transition exists as a guardrail.

Q54Explain the NLDM delay model. What are its inputs, what is interpolated, and where does it lose accuracy?

NLDM (Non-Linear Delay Model) stores characterized delay and output slew in 2D lookup tables indexed by input transition (slew) and output load capacitance:
cell_delay = f(input_slew, C_load)
output_slew = g(input_slew, C_load)

Operation: the timer looks up the cell's table at the actual input slew and load, interpolating (typically bilinear) between the nearest characterized grid points; outside the grid it extrapolates (lower confidence). The output slew becomes the next stage's input slew, chaining stage by stage.

Where it loses accuracy:
  • It models the load as a single lumped C, so it cannot capture resistive shielding of long RC interconnect (the driver sees less than the full pin cap).
  • It produces a delay/slew number, not a current waveform, so it is weak on RC interconnect delay and noise/coupling effects.
For accuracy, a receiver pin model + Elmore/AWE reduction approximates the net, but at advanced nodes NLDM is replaced by CCS (current-based) or ECSM for better interconnect and waveform fidelity.

Q55What is CCS, and why did the industry move from NLDM to CCS at advanced nodes?

CCS (Composite Current Source) models the cell's output as a time- and voltage-dependent current source rather than as a single delay/slew number. The library stores current waveforms as a function of input slew and output load, plus a receiver capacitance model (often split into C1 before the threshold and C2 after).

Why the move:
  • Interpolation accuracy: NLDM interpolates scalar delay; at sub-40nm the actual waveform is non-saturated and NLDM error grows. CCS drives the actual RC network with a real current waveform, giving accurate interconnect delay and slew.
  • Receiver model: NLDM treats the receiver as a fixed pin cap; CCS captures the nonlinear, voltage-dependent input cap (Miller effect), which matters for accurate driver loading.
  • Waveform propagation: CCS preserves the real voltage waveform shape, important for noise/SI analysis and for non-monotonic waveforms.
Cost: CCS libraries are several times larger and slower to read. ECSM (Cadence) is the voltage-based analog. Both are "current/waveform-based" successors to NLDM and standard at 16nm and below.

Q56How do load capacitance and fanout influence delay and slew? Walk through what happens when you double the fanout on a net.

Both delay and slew are monotonically increasing in output load to first order.

Doubling fanout (driving twice as many receiver pins) roughly doubles the pin-capacitance component of the load:
  • Driver delay increases - the driver must charge more C through its output resistance (delay ∝ R_drive · C_load to first order), so the NLDM/CCS lookup at higher C returns a larger delay.
  • Output slew degrades (gets slower) - same RC charging argument.
  • That degraded slew propagates to every receiver, increasing their delays too (the input-slew axis effect from the earlier question).
Total load = pin caps + wire cap. At advanced nodes wire cap and resistance often dominate, and long wires add RC delay independent of fanout count. Resistive shielding means the driver does not see the full far-end cap.

Fixes: insert buffers to split the fanout (each driver sees less load and slew recovers), upsize the driver (lower R), or use set_max_fanout / set_max_capacitance as DRC guardrails. This is the core of fanout/load optimization in synthesis and PD.

Q57A setup path has a slack of -80ps. Walk me through how you would attack it. Now the same path also has a hold violation — does fixing one help or hurt the other?

Setup -80ps (path too slow), fix by reducing data arrival or relaxing required:
  • Upsize slow cells / use higher-drive or LVT (faster) cells on the critical stages.
  • Buffer / restructure high-fanout or long nets to fix slew (often the hidden cause).
  • Reduce logic depth (logic restructuring, better mapping).
  • Useful skew: delay the capture clock / advance the launch clock to borrow time.
  • Check constraints: false/multicycle paths, over-tight uncertainty.
Setup vs hold interaction: they pull in opposite directions.
  • Setup wants the data path faster; hold wants short paths slower.
  • Making the cell faster (upsizing/LVT) improves setup but worsens hold on that same path.
  • Adding hold-fix delay buffers worsens setup.
Useful skew is doubly dangerous: delaying the capture clock helps setup but directly eats hold margin (hold uses the same/current edge), and vice versa. The clean approach: fix setup first with cell/skew changes, then close hold with min-delay buffers on the short paths only, which adds delay without touching the setup-critical paths. Always re-check both checks at all corners after any ECO.

Q58How do clock uncertainty, jitter, and OCV/derate enter the slack equation — and crucially, do they apply to setup, hold, or both?

Clock uncertainty (set_clock_uncertainty) is a pessimism margin subtracted from the available time. For setup it tightens required time; for hold it also tightens (pushes required later relative to data). You can set separate setup and hold uncertainty via -setup/-hold.
  • Setup uncertainty includes clock jitter + setup margin (period is uncertain).
  • Hold uncertainty is usually smaller - jitter largely cancels because launch and capture see the same edge; mostly it covers skew margin.
OCV / derate models on-die variation by scaling delays:
  • Setup (max) check: derate the launch + data path slow (e.g. ×1.05) and the capture clock fast (e.g. ×0.95) - worst case for setup.
  • Hold (min) check: the opposite - launch/data fast, capture clock slow - worst case for hold.
CRPR (Clock Reconvergence Pessimism Removal) credits back derate applied to the common clock segment, since one physical path cannot be simultaneously fast and slow. Modern flows use AOCV/POCV (depth/distance- and statistically-based derates) instead of a flat factor. Net effect on slack: uncertainty and derate reduce margin on both checks, but with opposite directionality, which is why both must be analyzed.

Q59A path's setup slack is fine in isolation, but the report shows a large slew at the capture flop's D pin. Why might this still cause a problem, and is slew a part of slack?

Slew is not directly a term in the slack equation, but it affects slack indirectly and as a separate DRC.

Why a large D-pin slew is still a problem:
  • Setup time grows with input slew. Library T_setup is characterized as a function of data slew (and clock slew); a degraded D-pin slew increases the required setup time, eating slack even though the arrival looked fine.
  • Max-transition DRC violation. If it exceeds set_max_transition the design is not signoff-clean regardless of slack, and delay-calc accuracy degrades.
  • Hold risk: sloppy slews change hold time too and increase SI/noise susceptibility.
Is slew part of slack? Indirectly: slew is an input to delay and to setup/hold constraint values, both of which feed slack. But it is reported and constrained separately as a design rule. A common interview trap: assuming a clean arrival number means a clean check - if the slew is bad, the characterized setup/hold itself shifts and the slack you computed by hand will be wrong. Always fix transition violations before trusting timing.

Q60Senior-level: a register-to-register path passes setup by +5ps at the slow corner but the same path fails hold by -30ps at the fast corner. Explain how a single physical path produces opposite results across corners, and how this constrains your fix.

This is normal multi-corner behavior; setup and hold are checked at different corners because they have opposite worst cases.

Why opposite:
  • Setup worst at slow corner (slow process, low voltage, high temp historically): cells are slowest, so data arrives latest - hardest to meet the next edge. Slack_setup = Required - Arrival is tightest here.
  • Hold worst at fast corner (fast process, high voltage): cells are fastest, data races through and arrives too early, violating the same-edge hold. Slack_hold = Arrival - Required is tightest here.
Plus temperature inversion at low nodes can flip which temperature is worst, so signoff sweeps multiple PVT corners.

How it constrains the fix:
  • The hold fix must add delay at the fast corner without breaking the +5ps setup margin at the slow corner - a tight window.
  • Insert min-delay buffers sized so their slow-corner delay contribution stays under 5ps, while their fast-corner delay closes the 30ps gap. Because a buffer's slow-corner delay is larger than its fast-corner delay, you may not be able to add 30ps of fast-corner delay without blowing the 5ps setup budget.
  • If the window is infeasible, prefer useful skew or restructuring, and re-verify both corners (and CRPR) after the ECO.
Takeaway: never close hold at the fast corner without re-checking setup at the slow corner - they share the same cells, so every delay element trades one against the other.

6. Timing Exceptions 12 questions

Q61What is a multicycle path (MCP), and why would you declare one?

A multicycle path is a path the STA tool is told to evaluate over more than one clock cycle instead of the default single cycle. By default STA assumes data launched at edge N must be captured at edge N+1 (one cycle). A multicycle path relaxes that: you tell the tool data is only required to be stable after, say, 2 or 3 cycles.

You declare one when the design genuinely allows extra cycles for the data to settle, for example:
  • A slow combinational block (large multiplier, divider) whose result is only sampled every Nth cycle by enable/clock-gating logic.
  • A path crossing between an integer-ratio clock and its divided version where the receiver only loads on certain edges.
The command sets the setup multiplier: set_multicycle_path 3 -setup -from FF1 -to FF2. Critically, the hold check must be fixed too (see the off-by-one question). The risk: if the hardware does not actually hold off the capture, the MCP is a lie and the chip fails silicon timing despite a clean STA report.

Q62Walk me through a setup-multicycle-path of 3 on a waveform. Where do the launch and capture edges land, and what is the resulting setup check?

With set_multicycle_path 3 -setup, the data is launched at edge 0 as usual, but the setup capture edge moves out from edge 1 (default) to edge 3. So the available time becomes three clock periods instead of one.

Setup requirement becomes:
T_cq + T_comb + T_setup ≤ 3·T_clk - T_setup_unc

Key points an interviewer wants:
  • The launch edge does not move — only the setup capture edge moves later by (N-1) cycles.
  • Setup uncertainty (jitter + margin) is still subtracted once, at the captured edge.
  • The relaxation is on the setup side only; hold is a separate, dangerous story.
In the diagram, default capture is edge 1; the MCP-3 setup capture is edge 3, giving 3 periods of slack.
Setup MCP of 3: launch at edge 0, setup capture moves to edge 3 (3 periods of slack). — click to enlarge

Q63This is the classic one: when you set a multicycle of 3 for setup, what must you set for hold, and why? Show the off-by-one trap.

When you declare set_multicycle_path 3 -setup, the tool by default puts the hold check one edge before the (moved) setup capture edge — i.e. at edge 2. That is almost always wrong and creates an impossibly tight (and bogus) hold requirement that pulls the hold edge far from the launch edge.

You must pull the hold check back with:
set_multicycle_path 2 -hold -from FF1 -to FF2

The rule: for a setup multicycle of N, set hold multicycle of N-1, so the hold check returns to the same edge as the launch edge (edge 0), exactly as in a normal single-cycle path.
  • Setup MCP = N moves the setup capture edge forward by N-1.
  • Hold MCP = N-1 moves the hold capture edge back by N-1, restoring it to edge 0.
The off-by-one trap: if you forget the hold MCP, the hold check is referenced to edge 2 (hold launch at edge 0, capture at edge 2), demanding the data not change for ~2 cycles after launch — a massive, false hold violation that engineers then 'fix' by inserting huge delay buffers, breaking real timing. Always pair setup MCP N with hold MCP N-1 (for same-edge integer-ratio clocks).
Off-by-one trap: setup MCP=3 default-checks hold at edge 2; hold MCP=2 pulls it back to edge 0. — click to enlarge

Q64Derive the general setup and hold equations for a multicycle path with setup multiplier N and hold multiplier M.

Let the launching edge be at time 0 and the clock period be T_clk.

Setup capture edge is at N·T_clk:
T_launch_clk + T_cq + T_comb ≤ N·T_clk + T_capture_clk - T_setup - T_setup_unc
where T_launch_clk / T_capture_clk are the clock network latencies (skew shows up as their difference).

Hold capture edge is at M·T_clk:
T_launch_clk + T_cq + T_comb ≥ M·T_clk + T_capture_clk + T_hold + T_hold_unc

The crucial relationship: for two flops on the same clock, you want the hold check referenced to the launching edge itself, i.e. edge 0. Choosing M = N - 1 places the hold capture edge at (N-1)·T_clk — and because the setup multiplier already pushed the setup capture to edge N, the hold check now lands exactly one cycle before it, which is edge 0 relative to the launch. This recovers the standard single-cycle hold check:
T_cq + T_comb ≥ T_hold + T_hold_unc (plus the skew term T_launch_clk - T_capture_clk).

Note the uncertainty signs: setup subtracts T_setup_unc (makes the requirement harder); hold adds T_hold_unc (also makes it harder). They are applied once, not multiplied by N.

Q65What is a false path? Give two realistic examples and the command to declare one.

A false path is a topological path that the tool can trace but which can never be sensitized in real operation, so it should be excluded from timing analysis. The command:
set_false_path -from A -to B (also -through for a specific node).

Realistic examples:
  • Asynchronous clock-domain crossings: a path from a launch flop in clkA to a capture flop in clkB where a proper synchronizer (2-FF, async FIFO) handles metastability. The data transfer is by handshake, not by single-cycle timing, so STA on it is meaningless. Often handled with set_clock_groups -asynchronous rather than many false paths.
  • Static / configuration registers: mode or config bits that are written once at boot and held constant; the path from them through datapath muxes is never timing-critical in steady state.
  • Physically unsensitizable mux paths: e.g. a path through a 2:1 mux that requires the select to be simultaneously 0 and 1 — logically impossible.
Risk: over-using false paths to silence violations masks real paths — if you false-path a CDC that lacks a synchronizer, you ship metastability. False paths also remove the path from both setup and hold analysis, so a wrongly declared one hides genuine hold issues too.

Q66Explain set_max_delay and set_min_delay. When would you use them instead of a multicycle or false path?

set_max_delay and set_min_delay override the timing requirement on a path with an absolute time, independent of clock edges:
  • set_max_delay 4.0 -from A -to B → the path's setup-style (max) requirement is 4 ns regardless of clock period.
  • set_min_delay 0.5 -from A -to B → the path's hold-style (min) requirement is 0.5 ns.
When to use them:
  • Point-to-point paths with no meaningful launch/capture clock — classic case is a pure combinational path between primary input and output (gated by virtual clocks), or async paths where you still want a bounded delay rather than a fully ignored false path.
  • To put a bounded constraint on an async CDC instead of a false path — you don't trust a single-cycle check, but you still want the wire/skew between launch and capture controlled (e.g. for a quasi-static or gray-coded bus): set_max_delay with a value smaller than one period limits skew so multiple bits arrive close together.
  • For glitch-sensitive or self-timed structures where the spec is an absolute delay, not a ratio of cycles.
vs. MCP / false path: use an MCP when the relationship is still clock-edge based but spans multiple cycles; use a false path when the path is never real; use max/min delay when you need an absolute, edge-independent bound. Note set_max_delay overrides the default setup check; pair it with set_min_delay if you also need to bound the min/hold side, since they are independent.

Q67What does set_disable_timing do, and how is it different from a false path?

set_disable_timing removes a specific timing arc (pin-to-pin path through a cell) from the timing graph entirely. Example: set_disable_timing -from A -to Z [get_cells mux1] disables the A→Z arc of that mux instance.

Key differences from a false path:
  • Granularity: set_false_path operates on a full start-to-end path (between startpoints/endpoints, optionally through nodes). set_disable_timing kills a single arc inside one cell/instance, affecting every path that traverses that arc.
  • Mechanism: disable_timing edits the timing graph (the arc no longer exists, so no delay propagates through it). A false path leaves the arc but tells the analyzer to ignore that particular path's slack.
  • Use cases for disable_timing: breaking a combinational loop so the tool can build a DAG; disabling a known-unused arc (e.g. a scan or test arc in functional mode); breaking the transparent arc of a latch; disabling a feedback path in an RS-latch-like structure.
Risk: disabling an arc is a blunt instrument — it removes that arc from all analysis (setup, hold, every mode/corner via SDC). Disabling the wrong arc can hide a real timing path you didn't intend to. Prefer the narrowest exception that achieves the goal.

Q68What is case analysis (set_case_analysis), and how does it interact with timing exceptions?

set_case_analysis forces a logic constant on a pin or port, e.g. set_case_analysis 0 test_mode. The tool then propagates that constant through the netlist and prunes the timing graph: any arc that becomes non-controlling (e.g. the other input of a now-constant AND/mux select) is automatically removed from analysis.

Why it matters:
  • It models the actual operating mode. Setting test_mode = 0 disables all scan/test paths so functional analysis isn't polluted by them — without writing dozens of false paths.
  • It selects mux configurations: forcing a select line constant tells STA which leg of a mux is active, so unsensitizable legs are dropped naturally.
  • It can disable a clock source (force a clock gate's enable, or force a clock mux select), which then prunes everything downstream of the dead clock.
Interaction with exceptions: case analysis is often the cleanest alternative to many false paths — it removes unsensitizable paths automatically and correctly, rather than you enumerating them by hand. It is applied during graph construction / constant propagation, before slack-level exceptions are resolved, so paths pruned by case analysis never even reach the false_path / MCP machinery.

Risk: a wrong constant models the wrong mode — e.g. forcing a config bit to a value the silicon won't actually use hides real paths. And you generally need a separate STA mode/run (or MMMC scenario) for each mutually exclusive configuration (functional vs. test vs. each clock-mux setting).

Q69A junior engineer is closing timing and decides to set_false_path every CDC violation he sees to clean up the report. What is wrong with this, and what should he do instead?

What's wrong:
  • A false path removes the path from both setup and hold analysis in all corners/modes. If any of those 'CDC' paths is actually a same-domain or synchronous path he misread, he has just hidden a real violation that will fail in silicon.
  • Even for a true async crossing, blanket false-pathing ignores the requirement that the data bus skew between bits be bounded so a multi-bit value doesn't tear. A 2-FF synchronizer protects a single bit; it does nothing for relative skew across a multi-bit bus.
  • It masks structural bugs — a crossing with no synchronizer looks 'clean' once false-pathed, then metastability bites at random.
What he should do:
  • Verify the crossing is genuinely async and properly synchronized (2-FF for single-bit control, async FIFO or gray-code for buses) — CDC structural checks, not just STA.
  • For truly async groups, prefer set_clock_groups -asynchronous (or -logically_exclusive for mux'd clocks) so the relationship is declared once at the clock level, not path by path.
  • For data buses where bit-to-bit alignment matters, use set_max_delay (and a min-delay/skew bound) so the path is still constrained, just not by a single-cycle edge check.
  • Document every exception with rationale — unreviewed exceptions are how chips die.

Q70Explain the priority/precedence order among timing exceptions when more than one applies to the same path.

When multiple exceptions overlap on the same path, tools (PrimeTime-style) resolve by a fixed precedence — highest priority wins:
  • 1. set_false_path — strongest; if a path is false, nothing else matters.
  • 2. set_max_delay / set_min_delay — absolute delay overrides.
  • 3. set_multicycle_path — edge-based multi-cycle relaxation.
  • 4. Default single-cycle check — if no exception applies.
Specificity also matters: within the same exception type, a more specific object set wins. Roughly: -from -through -to (most specific) > -from -to > -from or -to alone. A point-specific exception beats a clock-wide one.

Why an interviewer asks: it explains surprising behavior — e.g. you add an MCP but the path still shows single-cycle because a broader false_path (or a tighter set_max_delay) already governs it, or vice versa. It also warns against conflicting, redundant exceptions that make the SDC unmaintainable. Note set_disable_timing and set_case_analysis aren't in this list — they act on the graph itself (before slack-level exception resolution), so they effectively take precedence by removing the arc/path entirely.

Q71Senior-level: what is over-constraining, how does over-constraining with exceptions specifically hurt you, and how do you audit an SDC for bad exceptions?

Over-constraining means applying constraints tighter or broader than reality — the tool then optimizes against fiction. With timing exceptions the danger runs in both directions:
  • Too-loose exceptions (the silent killer): a false path / MCP / disable that hides a real path. STA reports clean, silicon fails. This is worse than a missing constraint because the violation is invisible.
  • Too-tight / mis-applied: e.g. a set_max_delay far below the period forces the optimizer to over-buffer and upsize cells, wasting area and power and even hurting other paths' timing by congestion.
  • Broad -from/-to with wildcards can accidentally catch paths you never meant to except — an MCP on a clock pair may relax a sub-path that genuinely needs single-cycle.
How to audit:
  • Run report_exceptions / report_timing -exceptions and check for ignored, overridden, or redundant exceptions — these usually indicate copy-paste SDC rot.
  • Look for exceptions with no matching objects (typo'd cell names) — they silently do nothing, leaving paths unconstrained.
  • Cross-check every CDC false_path against the CDC structural report (does a synchronizer actually exist?).
  • Prefer clock-level declarations (set_clock_groups) over many point exceptions, and require a documented rationale per exception in review.
  • Sanity-check that each setup MCP has its matching hold MCP (N / N-1), and that max_delay paths also have a min_delay where hold matters.

Q72On a half-cycle (negedge-capture) path versus a multicycle path, how do the launch/capture edges and hold checks differ?

Half-cycle path: launch on a posedge flop, capture on a negedge flop (or vice versa) of the same clock. The setup capture edge is only half a period after launch, so the setup budget is tight:
T_cq + T_comb + T_setup ≤ 0.5·T_clk - T_setup_unc (for 50% duty).

Multicycle path: launch and capture on the same edge type, but capture is moved out N full cycles — setup budget is relaxed to N·T_clk.

What happens to hold, precisely: the hold check is always referenced to the capture edge one (active) clock event before the setup capture edge. For a posedge-launch / negedge-capture path, the setup capture is the negedge at 0.5·T_clk, so the related hold check is the negedge half a cycle earlier — i.e. the negedge before launch, at -0.5·T_clk. That is a comfortably negative requirement, so hold on a posedge→negedge half-cycle path is usually not the bottleneck; setup is, because of the squeezed half-period.

Contrast with MCP: a properly-constrained MCP keeps the hold check at the original launch edge (via the N-1 hold multiplier), giving the ordinary single-cycle hold check, while relaxing setup to N·T_clk. Bottom line: an MCP relaxes setup and keeps hold normal; a half-cycle path tightens setup to half a period. The case where half-cycle hold genuinely bites is the negedge-launch / posedge-capture variant combined with large skew or a fast (min) corner — but the headline asymmetry is the tight 0.5·T_clk setup.
Half-cycle path: negedge capture gives only 0.5 T_clk for setup; setup is the tight check. — click to enlarge

7. OCV, AOCV, POCV & Derating 11 questions

Q73What is On-Chip Variation (OCV) and why do we model it in STA?

OCV accounts for the fact that identical cells/nets on the same die do not behave identically. Process (random dopant, lithography), voltage (IR drop across the rail), and temperature (thermal gradients) vary spatially across the chip, so two gates with the same library model can have different real delays.

A single corner library captures global (die-to-die) variation but NOT this within-die spread. Without modeling it, STA would be optimistic. To stay safe we pessimistically assume the launch path is slow and the capture path is fast (for setup), so we apply different delays to different paths in the same analysis.

The classic mechanism is set_timing_derate: multiply launch-clock + data delays by a late factor (e.g. 1.05) and capture-clock delays by an early factor (e.g. 0.95) within the same setup check.
  • Margin without a full statistical model.
  • Captures within-die P/V/T spread that a single corner misses.

Q74Explain the set_timing_derate command. What do the -early and -late, -cell_delay/-net_delay, and -clock/-data options mean?

set_timing_derate scales delays to model OCV. Signs and targets:
  • -late — multiplier ≥ 1.0; makes the path slower (more delay).
  • -early — multiplier ≤ 1.0; makes the path faster (less delay).
  • -cell_delay / -net_delay — derate gate delays vs interconnect delays separately.
  • -clock / -data — derate clock-network arcs vs data-path arcs separately.
Example:
set_timing_derate -late 1.05
set_timing_derate -early 0.95

The tool picks early vs late per arc based on which is pessimistic for the check. It is NOT the user who decides which path gets early; the engine applies -late to the path that must be slow and -early to the path that must be fast for that specific setup/hold check.

Q75For a SETUP check, which paths get the late derate and which get the early derate? Show the timing inequality.

Setup wants the data to arrive before the capture clock edge. The pessimistic assumption is: data/launch is slow, capture clock is fast.
  • Launch clock pathlate (slow), so data is launched late.
  • Data path (combinational)late (slow).
  • Capture clock pathearly (fast), so the capture edge comes sooner.
Setup constraint (single-cycle):
T_launch·late + T_cq·late + T_data·late + T_setup ≤ T_period + T_capture·early

So the common clock path before the divergence point is derated late in launch and early in capture — which is exactly the pessimism that CRPR/CPPR removes (a common segment cannot be simultaneously slow and fast).
Setup derating: launch clock and data are slowed (late), capture clock is sped up (early); the shared clock segment double-counts pessimism, fixed by CRPR. — click to enlarge

Q76For a HOLD check, which paths get early and which get late derate? Why is it the opposite of setup?

Hold wants the data to arrive after the hold window of the same capture edge — i.e. new data must not race through and corrupt the just-captured value. The pessimistic assumption flips: data/launch is fast, capture clock is slow.
  • Launch clock pathearly (fast).
  • Data pathearly (fast), so new data arrives as soon as possible.
  • Capture clock pathlate (slow), so the hold window closes as late as possible.
Hold constraint (same edge):
T_launch·early + T_cq·early + T_data·early ≥ T_capture·late + T_hold

It is opposite to setup because setup is a max-delay (next-edge) check while hold is a min-delay (same-edge) check — the worst case for each is the mirror image of the other. The tool applies early/late per arc automatically depending on which check is being analyzed.

Q77What is the main weakness of flat OCV derating, and how does AOCV improve on it?

Flat OCV applies one constant derate (e.g. 5%) to every stage of every path. This is too pessimistic for long paths: random process variation partially averages out over many stages (per the central limit theorem — sigma of the sum grows as √N, so the percentage spread shrinks as 1/√N). A 30-gate path should not carry the same per-stage margin as a 3-gate path.

AOCV (Advanced OCV) makes the derate a function of:
  • Path depth — more logic stages → smaller per-stage derate (random variation cancels).
  • Path distance / location — physically spread cells see more systematic gradient → larger derate.
The tool looks up a 2-D derate table indexed by depth and distance, giving less pessimism on deep paths while keeping margin on shallow ones. Result: fewer false violations and a tighter, still-safe signoff than flat OCV.
AOCV reduces per-stage derate as path depth grows (~1/&radic;N), unlike flat OCV's constant pessimism. — click to enlarge

Q78What exactly are 'depth' and 'distance' in an AOCV table, and which physical effect does each capture?

An AOCV derate is read from a table indexed by two variables:
  • Depth = number of logic stages (combinational + clock buffers) in the path segment. Higher depth captures random within-die variation, which averages toward the mean as stages add — so derate decreases with depth.
  • Distance = physical span / bounding-box of the path's cells on the die. Larger distance captures systematic spatial gradients (IR-drop, temperature, lithography). More spread → more gradient → derate increases with distance.
Key nuance: depth and distance are computed separately for the clock network and the data path, and there are separate tables for early and late. The clock common path typically has its own depth so CRPR is consistent. Effectively: derate = f(random↓(depth), systematic↑(distance)).

Q79What is POCV/SOCV and how is it fundamentally different from OCV/AOCV?

POCV (Parametric OCV) — also called SOCV (Statistical OCV) — treats each cell/arc delay as a random variable with a mean and a standard deviation (sigma), instead of a deterministic multiplier.

Difference from OCV/AOCV:
  • OCV/AOCV apply a fixed multiplier to delay → pessimism scales linearly with path delay.
  • POCV models delay as a distribution; the path delay is the statistical sum, so variances add as sigma_path = √(Σ sigma_i^2) and the mean adds linearly.
The path delay used in signoff is mean ± n·sigma_path (e.g. +3σ for late, -3σ for early). Because sigmas add in quadrature (RSS), long paths get much less pessimism than flat OCV — this is the proper statistical CLT behavior that AOCV only approximates with a table. POCV needs LVF/Liberty Variation Format data (per-arc sigma) in the .lib.

Q80In POCV, path-delay sigmas add in quadrature. A path has 4 stages, each with delay sigma = 5 ps. What is the 3-sigma late margin contribution, and why is this less pessimistic than flat OCV?

Sigmas combine by root-sum-square (RSS), not linear addition (assuming independent random variation):
sigma_path = √(5^2 + 5^2 + 5^2 + 5^2) = √100 = 10 ps

3-sigma late margin: 3 × 10 = 30 ps added to the mean.

Contrast with linear summing of per-stage sigma (the conservative way flat OCV-style pessimism scales): the path sigma would be taken as 4 × 5 = 20 ps, giving a 3-sigma margin of 3 × 20 = 60 ps.

So POCV gives 30 ps vs 60 ps — half the pessimism for a 4-stage path, and the gap widens with depth (RSS grows as √N, linear grows as N). This is why POCV recovers real timing margin on deep paths while still being statistically safe.

Q81Explain Graph-Based Analysis (GBA) versus Path-Based Analysis (PBA). Why is GBA pessimistic, and where does AOCV/POCV interact with this?

GBA computes the worst slew and worst arrival at each node and propagates that worst case forward through the timing graph. It is fast but pessimistic: the worst slew at a node may come from one fanin while the worst arrival comes from another — GBA combines them even though no single real path sees both.

PBA re-times each path individually, using the actual slew that propagates along that path. It removes the slew-merging pessimism, so PBA slack ≥ GBA slack (PBA is the more accurate, optimistic-direction recovery).
  • GBA — conservative, used for the first signoff pass.
  • PBA — run on failing/near-critical paths to recover false violations.
Interaction: AOCV/POCV depth and distance are path properties. In GBA they are estimated from worst-case graph depth; in PBA they are computed exactly per path, so PBA + AOCV/POCV gives the tightest, least-pessimistic but still-safe result. This is why interviewers stress: run PBA derate recovery before declaring a path a real violation.

Q82What is CRPR (Clock Reconvergence Pessimism Removal) and why does derating make it necessary?

When you derate, the common clock segment shared by launch and capture is forced to be slow (late) for launch and fast (early) for capture simultaneously — physically impossible, since it is the same buffers and wires. This double-counts pessimism on the shared portion.

CRPR (Synopsys term; Cadence calls it CPPR) computes the delay of the common path up to the divergence point and credits back the difference between its late and early derated values:
CRPR_credit = T_common·late - T_common·early

This credit is added to setup slack (and likewise corrected for hold). Without CRPR, derating would report large false violations on paths that share most of their clock tree (common in clock-gated and balanced trees). It is on by default in signoff and is essential once any derate or OCV/AOCV/POCV is applied.

Q83Senior-level: how do you decide between AOCV and POCV for a 7nm/5nm signoff, and what data must the library contain for each?

Decision driver: dominance of random vs systematic variation and available library data.
  • AOCV needs an AOCV derate table (depth × distance, early/late) generated from Monte-Carlo characterization. It is a table-lookup approximation — good when you lack per-arc sigma data or want faster runtime.
  • POCV/SOCV needs LVF (Liberty Variation Format) in the .lib: per-arc delay/constraint sigma (and optionally mean-shift, slew sigma). It models variation statistically per arc.
At 7nm/5nm, random variation dominates and paths are deep, so flat/AOCV over-margins badly — the industry standard is POCV/LVF for accuracy and area/power recovery. AOCV is a legacy/bridge methodology.

Senior points an interviewer wants to hear:
  • POCV handles moments (mean + sigma), so RSS gives proper CLT behavior; AOCV only approximates it.
  • Always pair with PBA for final critical-path recovery and confirm CRPR/CPPR is active.
  • Verify LVF is present for all corners; missing LVF silently falls back to deterministic delay (hidden optimism).

8. CRPR & Clock Reconvergence 12 questions

Q84What is Clock Reconvergence Pessimism (CRP), and why does it arise in timing analysis?

CRP is the artificial, non-physical pessimism introduced when the launch clock path and the capture clock path share a common segment, but the analysis applies different timing values to that same physical segment.

It arises from per-path, per-edge derating:
  • The launch clock path has the common segment derated one way (e.g. slowed).
  • The capture clock path has the same physical segment derated the other way (e.g. sped up).
Since both paths physically traverse the same gates and wires up to the divergence point, those shared cells cannot simultaneously be slow and fast. Counting both is double-counting variation on shared hardware. CRPR (CRP Removal) is the credit the tool adds back to cancel this impossible pessimism. It is purely an analysis artifact, not a real silicon effect.
Launch and capture clock paths share a common segment, then diverge at the common point. — click to enlarge

Q85What is the 'common point' (common pin / divergence point) in CRPR analysis, and how does the tool find it?

The common point is the last node shared by the launch and capture clock paths before they diverge toward the launch flop and the capture flop respectively.

To find it, the tool:
  • Traces the clock network from the source to the launch flop's clock pin and to the capture flop's clock pin.
  • Walks both traces and identifies the furthest-downstream node common to both (the deepest shared buffer/wire node).
Everything from the clock source up to and including the common point is the common clock path; beyond it the paths split. CRPR credit is computed on this shared portion only. For two flops fed by the same clock tree, the common point is typically a buffer deep in the tree close to the leaves; the closer the common point is to the leaves, the larger the shared path and the larger the CRPR credit.

Q86Show how CRP appears in a setup check. What is the worst-case skew expression and where does pessimism enter?

For a setup check the requirement is roughly T_launch + T_cq + T_data + T_setup ≤ T_period + T_capture, so skew T_skew = T_capture - T_launch helps when positive.

Under OCV for setup, the tool wants the worst (smallest) effective skew, so it makes the launch clock path late (max delay) and the capture clock path early (min delay).
  • On the common segment, launch is computed with max (slow) delay.
  • On the same common segment, capture is computed with min (fast) delay.
That difference (max - min) on shared hardware is fictitious pessimism. CRPR credit = (late common delay) - (early common delay), and it is added back to the slack to remove the part that cannot physically occur.

Q87Repeat the analysis for a hold check: which clock edge is made early/late on the common path, and what is the sign of the CRPR credit?

For a hold check the requirement is T_launch + T_cq + T_data ≥ T_capture + T_hold. Hold is worst when the launch arrives late relative to capture, i.e. the tool makes the launch clock path early/fast (min) and the capture clock path late/slow (max) — the opposite of setup.
  • On the common segment: launch uses min delay, capture uses max delay.
  • The artificial gap on shared hardware is again (max - min).
CRPR credit is always a positive number that improves slack (it removes pessimism for both setup and hold), but the direction of derate on the common path is reversed between the two checks. A common interview trap: candidates forget that for hold the launch clock is the fast one and capture is the slow one on the shared path.
Common-path derate direction flips between setup and hold; CRPR credit is positive in both. — click to enlarge

Q88You run STA in Graph-Based Analysis (GBA) mode. Why does GBA double-count derate on the shared clock path, and how does CRPR fix it?

GBA computes a single worst min arrival and worst max arrival per node, independent of which downstream path uses it. It does not know that the launch and capture checks for a given flop pair share a common clock segment.
  • For the launch clock it uses the late (max-derated) arrival at the leaf.
  • For the capture clock it uses the early (min-derated) arrival at the leaf.
Both arrivals were built by propagating derate through the same shared buffers, so the shared buffers' variation is effectively counted twice (once slow, once fast). CRPR removes this by, at the common point, computing the (late - early) delta accumulated on the common path and crediting it back to the check. Without CRPR, GBA setup/hold slacks are pessimistically wrong; CRPR is mandatory whenever OCV/AOCV/POCV derate is enabled.

Q89How does Path-Based Analysis (PBA) relate to CRPR? Does PBA still need CRPR?

PBA re-times a specific launch/capture path pair end-to-end, so it can be smarter about the common segment, but it still requires CRPR.
  • PBA removes GBA worst-slew/worst-arrival pessimism on the data and clock by recomputing slews along the actual path — this is a different pessimism than CRP.
  • CRP is specifically about applying opposite derate to the same shared clock cells. Even a path-based engine, when it derates launch late and capture early, must still credit back the common-path delta.
So PBA and CRPR are orthogonal: PBA reduces slew/arrival pessimism; CRPR removes shared-clock-path derate double-counting. Production flows enable CRPR in both GBA and PBA. PBA typically recovers more total pessimism overall, but it does not make CRPR unnecessary.

Q90Why does CRPR matter much more when OCV / AOCV / POCV derates are aggressive, and almost not at all with a single flat corner?

CRP magnitude is driven by the difference between the late and early delay on the common path:
  • With a single PVT corner and no derate, launch and capture see the same delay on the common segment, so (late - early) = 0 and CRP is essentially zero.
  • With OCV (e.g. +5% late / -5% early), every shared buffer contributes a spread, and the spread accumulates with the depth of the common path.
  • With AOCV/POCV, derate depends on stage count / statistical sigma — deep shared clock trees produce large common-path spreads.
So the more aggressive the derate and the deeper the shared clock tree, the larger the fictitious pessimism CRPR must remove. In modern nodes with heavy OCV and long clock trees, CRPR credit can be a significant fraction of the clock period — ignoring it would make timing look falsely unclosable.

Q91Draw the canonical common-clock-path picture and label where CRPR credit is computed.

The credit is computed at the common point: it equals the difference between the late-derated and early-derated arrival accumulated from the clock root to that node.
  • Common clock path: root to common point (shared by launch and capture).
  • Divergence point / common point: where the tree splits to the launch flop and capture flop.
  • CRPR = arrival_late(common) - arrival_early(common), added back to slack.
Below the common point, derate is real and is kept (launch and capture branches are genuinely different hardware).
Common clock path from root to common point; CRPR credit = late minus early arrival at that node. — click to enlarge

Q92Two flops are clocked by completely independent clock trees (different PLLs, no shared buffer). How much CRPR credit applies, and why is this case dangerous?

CRPR credit is zero — there is no common point, so there is no shared hardware whose derate could be double-counted.

This is dangerous because:
  • The full late/early derate on both clock trees is legitimately applied with no offsetting credit, so pessimism is real and large.
  • Worse, for asynchronous or different-PLL clocks the two trees have uncorrelated jitter and skew, so analyzing them as a synchronous setup/hold check is itself questionable — such crossings usually need a synchronizer/CDC treatment, not a CRPR-credited timing check.
The interview point: CRPR is only meaningful when the launch and capture clocks physically share a network. No shared path means no credit, and possibly means the path should not be timed as a single synchronous check at all.

Q93In a generated/divided clock scenario, the launch flop uses the source clock and the capture flop uses a clock divided by a flop on the same tree. Where is the common point, and what subtlety affects CRPR?

The common point is in the source clock network, at the deepest buffer shared before the tree splits toward (a) the launch flop and (b) the divider flop's clock pin.
  • The shared segment is from the root to that split; CRPR credit covers only that shared source-clock portion.
  • The divider flop and the generated-clock buffers after it are NOT common — the generated clock is physically a separate, downstream branch even though it derives from the same source.
The subtlety: the tool must correctly map the generated clock back through its master pin to find the real common point on the source clock. If the generated clock is defined at the wrong pin or the master clock relationship is broken, the tool may compute a wrong (too small or too large) common path, giving incorrect CRPR credit. Always verify the common point and the generated-clock definition in such cases.

Q94During timing closure an engineer sees a setup path fail by a few ps, then it passes after enabling CRPR (or a tighter common-point search) with no netlist edit. Explain what happened and whether this is legitimate.

Enabling (or improving the accuracy of) CRPR added back legitimate credit that was previously being subtracted as fictitious pessimism:
  • Before: the shared clock buffers were derated late on launch and early on capture, double-counting variation that cannot physically happen.
  • After: the tool credits (late - early) on the common path back into the slack, so the path gains margin and now passes.
This is fully legitimate — CRPR removes non-physical pessimism, it does not optimistically hide a real violation. Two cautions for the interview:
  • The credit is only valid up to the true common point; an over-aggressive or buggy common-point search that claims too much shared path would be optimistic and unsafe.
  • CRPR does not relax jitter/uncertainty applied at the capture flop after the common point, nor inter-clock uncertainty between async clocks.
So a few-ps swing from enabling CRPR is expected and correct, provided the common point is accurate.

Q95How does clock uncertainty (jitter + skew margin) interact with CRPR? Does CRPR remove user-specified uncertainty?

No — CRPR and clock uncertainty are independent. CRPR only cancels derate double-counting on shared clock hardware; it does not touch the uncertainty number you set with set_clock_uncertainty.
  • Uncertainty models jitter and un-budgeted skew and is applied at the capture edge of the check (subtracted from setup margin, and applied appropriately for hold).
  • It is a user/PLL-driven margin, applied regardless of common path.
Best practice: because the common clock path sees correlated jitter, some flows reduce uncertainty for paths with a deep common tree (jitter on shared buffers is common-mode and partially cancels), but that is done via uncertainty modeling, not by CRPR. The clean mental model: CRPR removes fake pessimism from OCV derate; clock uncertainty adds real margin for jitter — and the two are applied separately.

9. Crosstalk & Signal Integrity 11 questions

Q96What is coupling capacitance, and why does it matter for timing in advanced nodes?

Coupling capacitance (Cc) is the parasitic capacitance between two adjacent signal nets running in parallel, separated by dielectric. When two nets are routed close together (especially on the same metal layer), the side-wall capacitance between them becomes a coupling path.

Why it matters:
  • As process nodes shrink, wires get taller and closer (high aspect ratio), so side-wall coupling dominates over area (ground) capacitance — at advanced nodes Cc can be 60-80% of a net's total capacitance.
  • The effective capacitance a victim net sees depends on whether its neighbor (aggressor) is switching and in which direction — this is the Miller effect, captured by a Miller Coupling Factor (MCF) ranging roughly 0 to 2.
  • This makes net delay state-dependent, which is exactly what SI-aware STA must model.
Coupling capacitance Cc forms between adjacent parallel nets. — click to enlarge

Q97Define aggressor and victim in the context of crosstalk. Can a net be both?

Victim: the net whose timing or signal value we are currently analyzing — the net being disturbed.
Aggressor: a neighboring net, coupled to the victim through Cc, whose switching activity injects charge onto the victim and disturbs it.

Key points:
  • The roles are relative to the analysis, not fixed physical properties. When STA analyzes net X, its neighbors are aggressors; when it later analyzes a neighbor Y, then X becomes one of Y's aggressors. So yes, a net is both aggressor and victim depending on which net is under analysis.
  • A victim typically has multiple aggressors. SI tools group them and consider which aggressors can realistically switch together within overlapping timing windows.
  • An aggressor only affects the victim if it actually transitions — a quiet (steady) neighbor contributes only its grounded coupling cap, no delta.

Q98Explain crosstalk delta delay. Why does same-direction switching speed the victim up while opposite-direction slows it down?

Crosstalk delta delay is the change in a victim net's propagation delay caused by an aggressor switching during the victim's transition. It arises because the voltage across the coupling cap Cc determines the charge that must be moved.

The driver must charge/discharge the effective coupling capacitance Cc_eff = MCF × Cc:
  • Opposite direction (victim rising, aggressor falling): the voltage swing across Cc is doubled (~2×Vdd), so the driver must move twice the charge through Cc. MCF approaches 2, effective cap increases, and the victim slows down (positive delta delay) — this hurts setup.
  • Same direction (both rising together): both plates move together, so the voltage across Cc barely changes, the driver moves almost no charge through it (MCF approaches 0), the victim speeds up (negative delta delay) — this hurts hold.
For setup checks, STA uses the worst slow-down on the launch/data path; for hold, the worst speed-up. The exact MCF depends on relative arrival times and slews.
Opposite-direction aggressor delays the victim (setup risk); same-direction speeds it up (hold risk). — click to enlarge

Q99What is a crosstalk glitch (noise bump), and how is it different from a delta-delay effect?

A crosstalk glitch is a spurious voltage bump injected onto a static (non-switching) victim net when an aggressor switches. The aggressor edge capacitively couples charge through Cc, momentarily pulling the quiet victim away from its held rail.

Difference from delta delay:
  • Delta delay affects a victim that is switching — it shifts the timing (earlier/later) of a real transition. It is a functional/timing effect on a moving signal.
  • Glitch (noise) affects a victim that is supposed to stay constant — it creates a momentary unwanted pulse. This is a noise/signal-integrity effect.
Glitch danger:
  • If the bump exceeds a threshold and reaches a combinational input, it can propagate as a wrong logic value (functional failure).
  • If it reaches a sequential element's data, clock, set/reset, it can cause incorrect capture or a false clock edge.
Glitches are checked against noise immunity curves / DC and AC noise margins (e.g., CCS-Noise or ECSM models), not against setup/hold slack directly.
An aggressor edge injects a glitch onto a quiet victim; danger if it crosses the noise threshold. — click to enlarge

Q100What is SI-aware STA, and what extra inputs does it need beyond normal STA?

SI-aware STA is static timing analysis that accounts for crosstalk-induced delta delay and noise, producing slacks that reflect coupling effects rather than treating each net in isolation.

Extra inputs/requirements beyond normal STA:
  • Coupled parasitics — a coupled SPEF that preserves Cc as explicit aggressor-to-victim coupling caps, rather than grounding or lumping them into total net capacitance.
  • Noise-capable cell modelsCCS-Noise (Synopsys) or ECSM-Noise (Cadence) to model how a driver holds a node and a receiver's noise immunity (DC/AC noise margins).
  • Accurate slews and arrival windows from a converged delay calculation, so the tool can determine aggressor/victim timing-window overlap.
  • SI analysis is enabled explicitly, e.g. set si_enable_analysis true and a coupled SPEF read in PrimeTime-SI.
Output: delays and slacks that include per-arc delta delay, plus noise (glitch) reports.

Q101Walk me through how PrimeTime-SI iterates to converge on crosstalk delays.

PrimeTime-SI converges through an iterative loop because delta delay depends on arrival times, but arrival times depend on delta delay — a chicken-and-egg problem.

The flow:
  • Iteration 0 (base): Compute delays with no crosstalk (coupling caps grounded). This gives initial arrival times and slews for every net.
  • Timing-window computation: From those arrivals, build each aggressor's and victim's switching windows.
  • Delta-delay calc: For each victim, find aggressors whose windows overlap the victim's, then compute delta delay using their relative timing, slews, and direction.
  • Re-time: Re-propagate arrivals with the new delta delays. This shifts the windows.
  • Repeat the window/delta-delay/re-time steps. With shifted windows, some aggressors may now fall out of (or into) overlap.
The loop continues until delta delays and arrivals converge (change below a tolerance). Typically a few iterations. Reselect options and incremental modes control how aggressively windows are re-pruned each pass; over-pessimism is reduced as the windows tighten.
PrimeTime-SI loops: base delay then windows then delta delay then re-time, repeating to convergence. — click to enlarge

Q102What are timing windows in crosstalk analysis, and how do they reduce pessimism?

A timing window is the time interval [earliest arrival, latest arrival] during which a net can switch in a given clock cycle. SI-aware STA computes a window for every net.

How they reduce pessimism:
  • An aggressor can only inject delta delay/noise onto a victim if it switches while the victim is switching — i.e., if their windows overlap.
  • If an aggressor's window does not overlap the victim's, that aggressor is excluded from the delta-delay/noise computation for that check — it physically cannot interfere at that moment.
  • Without windows, the tool would assume all aggressors switch at the worst possible instant simultaneously — hugely pessimistic and unrealistic.
Caveats:
  • The tool separately considers setup vs hold and the relevant min/max corners, and may use different window edges for each.
  • Mutually exclusive / false-path relationships and incremental window tightening across iterations further prune aggressors.
  • Overly tight windows risk optimism, so tools apply guard-banding and reselection thresholds.
Only aggressors whose switching windows overlap the victim's are counted; non-overlapping ones are excluded. — click to enlarge

Q103List the main techniques to fix crosstalk violations, and explain the trade-off of each.

Increase spacing (NDR / wider pitch): Move the victim away from aggressors. Reduces Cc (it scales inversely with spacing). Trade-off: consumes routing resources / area; may cause congestion.

Shielding: Route a Vdd/Vss line between victim and aggressor. The shield's grounded cap replaces the coupling cap so neighbor switching no longer couples. Trade-off: expensive in tracks; reserved for critical/clock/analog-sensitive nets.

Buffer insertion / driver upsizing: A stronger driver lowers the victim's output resistance, so it holds its node firmly and resists injected charge (smaller delta and glitch); buffers also split long parallel runs. Trade-off: more area/power; a too-large buffer can itself become an aggressor.

Reduce parallel run length / layer or track hopping (jog): Less side-by-side overlap means less total Cc. Trade-off: extra vias add resistance and can hurt delay.

Slew / skew control: Slowing the aggressor's edge (weaker aggressor driver) lowers dV/dt and injected charge; or temporally separating windows. Trade-off: a slower aggressor may then fail its own timing.

In PD, the usual order is: spacing/jog (cheap) then buffer/upsize then shielding (for clocks and the worst victims).

Q104A victim net shows a positive (worsening) setup delta delay AND a negative (worsening) hold delta delay on different aggressors. How can the same net hurt both setup and hold?

This is normal and not a contradiction, because setup and hold are evaluated under different aggressor assumptions and on different (or oppositely-switching) aggressors.
  • For the setup check, STA wants the worst slow-down of the victim's data transition. It picks aggressors switching in the opposite direction within the overlapping window, giving positive delta delay so the data arrives later and setup margin shrinks: slack_setup = T_req - (T_arr + Δ_late).
  • For the hold check, STA wants the worst speed-up. It picks aggressors switching in the same direction, giving negative delta delay so the data arrives earlier and hold margin shrinks: slack_hold = (T_arr - Δ_early) - T_hold_req.
Because the two checks use opposite-sign Miller factors and may even involve different aggressors with non-overlapping windows, the same victim net can legitimately show a setup-degrading delta in one scenario and a hold-degrading delta in another. SI-aware STA runs min and max delay analysis separately for exactly this reason.

Q105Why is crosstalk especially dangerous on clock nets, and how do you handle it?

Clock nets are the most sensitive to crosstalk for several reasons:
  • A clock edge feeds thousands of flops; a delta delay on the clock shifts the capture or launch edge, directly skewing setup/hold across the whole fanout.
  • Crosstalk on a clock can create jitter (cycle-to-cycle edge movement) and, in the worst case, a glitch that looks like an extra clock edge — causing a false capture (functional failure, not just a timing miss).
  • Clock and data delta delays can compound: an aggressor that delays the capture clock while data is unaffected eats setup margin even if the data path itself is clean.
Handling:
  • Shield the clock with Vdd/Vss on both sides (double-sided shielding) — the standard practice for clock spines and trunks.
  • Route clocks on higher/dedicated layers with NDR (wider wire + extra spacing).
  • Keep clock buffers strong (low driver resistance) and balance the tree so windows don't align with noisy aggressors.
  • In STA, ensure the clock network's delta delays are included and check clock glitch/noise against immunity, not just data slack.

Q106What is the Miller Coupling Factor (MCF), and what range of values does it take? Why can it exceed 1?

The Miller Coupling Factor (MCF) is a multiplier applied to the physical coupling capacitance to get the effective capacitance the victim's driver sees, accounting for the relative switching of the two nets: Cc_eff = MCF × Cc.

Range and meaning (for a victim with a switching neighbor):
  • MCF ≈ 0 — aggressor switches in the same direction, same time: both plates of Cc move together, voltage across Cc is nearly constant, so almost no charge flows through it. Victim speeds up.
  • MCF ≈ 1 — aggressor is quiet/steady: Cc behaves like an ordinary grounded cap.
  • MCF ≈ 2 — aggressor switches in the opposite direction: voltage across Cc swings by ~2×Vdd, so the driver must move twice the charge. Victim slows down.
It exceeds 1 because the far plate of the coupling cap moves in the opposite direction, effectively doubling the voltage excursion across Cc — the classic Miller effect. Static/grounded-cap STA uses a single MCF (often a worst-case constant), whereas SI-aware STA derives an effective MCF per arc from actual slews and arrival overlap.

10. SDC Constraints 11 questions

Q107What does <code>create_clock</code> define, and what are its essential arguments? How does it differ from <code>create_generated_clock</code>?

create_clock defines a primary (master) clock at a source point and tells the timing engine the waveform that all launch/capture edges are derived from. Essential arguments:
  • -period — the clock period (mandatory).
  • -waveform {rise fall} — edge times within one period; default is {0 period/2} (50% duty).
  • -name — clock name (so you can reference it later).
  • The source object — a port or internal pin where the clock originates.
create_generated_clock defines a clock derived from a master clock by on-chip logic (a divider, MUX, or clock gate). It does not introduce a new free-running source; instead it is phase/frequency-locked to its -source via -divide_by, -multiply_by, -edges, or -invert.
Key point: a generated clock inherits jitter/latency relationships from the master and keeps the two clocks in a known relationship so paths between them are treated as synchronous, whereas two independent create_clock sources are asynchronous unless you relate them.

Q108Explain <code>set_input_delay</code> and <code>set_output_delay</code>. What physically do they model, and why do you need both -max and -min?

These constraints budget the timing of the off-chip portion of an I/O path that STA cannot see.
  • set_input_delay models the delay from an external launching register's clock edge through external logic to the input port of your chip. It tells STA "data arrives this late relative to the reference clock edge," so the engine knows how much of the period is already consumed before the signal even enters your block.
  • set_output_delay models the delay outside your chip from the output port to the external capturing register (external comb delay + that register's setup time). It reserves that portion of the period.
You need both flavors because the two checks use different corners:
  • -max is used for the setup check (worst-case late data must still meet the receiver's setup).
  • -min is used for the hold check (best-case early data must not violate the receiver's hold).
Both are specified relative to a clock via -clock (often a virtual clock for true chip I/O). Setup-relevant: arrival = input_delay + internal_path must be ≤ T_clk - setup.
Input/output delays budget the off-chip path that STA cannot see. — click to enlarge

Q109What is a virtual clock and when must you use one? Give a concrete I/O example.

A virtual clock is a clock created with create_clock -name VCLK -period P with no source object (no get_ports/get_pins). It exists only as a timing reference — it does not propagate through the netlist or drive any pin.
Why it's needed: for true chip-level I/O, the register that launches an input (or captures an output) lives on another chip on the board, clocked by a copy of the clock that never physically enters your block. You still need a reference edge to write set_input_delay/set_output_delay against. The virtual clock supplies that reference without contaminating internal clock-network timing.
Example:
  • create_clock -name vclk -period 10 (no source)
  • set_input_delay 4 -clock vclk [get_ports din]
  • set_output_delay 3 -clock vclk [get_ports dout]
A subtle benefit: because the virtual clock and the real on-chip clock can have different latencies, the virtual clock cleanly models the skew between the off-chip clock source and your internal clock tree, which a real internal clock pin could not.

Q110Differentiate <code>set_clock_latency</code> from <code>set_clock_uncertainty</code>. Which one models jitter, and how does each affect setup vs hold?

set_clock_latency models the nominal (deterministic) delay of the clock arriving at register clock pins. Two kinds:
  • Source latency — from the true clock origin to the clock definition point (e.g., off-chip + PLL).
  • Network latency — from the clock port through the clock tree to the register (an estimate used before CTS; after CTS you use -propagated and the real tree delay replaces it).
set_clock_uncertainty models the non-deterministic margin you subtract from the available time: clock jitter + skew estimate (pre-CTS) + design margin. This is the constraint that captures jitter.
Effect on checks:
  • Setup: uncertainty is subtracted from the capture edge, tightening the requirement: T_clk - setup - uncertainty(setup).
  • Hold: uncertainty is added to the hold requirement, again making it harder: required hold increases by uncertainty(hold).
Use separate values via -setup and -hold — hold uncertainty is typically smaller (jitter mostly cancels within one edge for hold, which is a single-edge check).

Q111Walk through the setup and hold equations for a single-cycle reg-to-reg path, and show exactly where clock skew, uncertainty, and CPPR fit.

Let launch clock arrive at time T_launch and capture clock at T_capture = T_launch + T_clk + skew, where skew = capture_latency - launch_latency.
Setup (max-delay, captured one cycle later):
T_cq + T_comb_max ≤ T_clk + skew - T_setup - uncertainty_s
Positive skew (capture later than launch) helps setup; uncertainty always hurts.
Hold (min-delay, same launching edge):
T_cq + T_comb_min ≥ skew + T_hold + uncertainty_h
Here positive skew (capture later) hurts hold — note the sign flips versus setup. Hold is checked on the same edge, so it is period-independent.
CPPR (Clock Path Pessimism Removal) credits back the pessimism from applying different min/max derates to the shared (common) portion of the launch and capture clock paths. The common segment physically sees one delay, so STA adds back the over-counted pessimism, improving both setup and hold slack on related clocks.
Setup uses the next capture edge (T_clk + skew); hold checks the same edge. — click to enlarge

Q112Contrast <code>set_false_path</code> with <code>set_clock_groups -asynchronous</code>. When would using a false_path instead of clock groups be a mistake?

set_false_path disables timing on a specific path or set of endpoints you enumerate (via -from/-through/-to). It is surgical and one-directional.
set_clock_groups -asynchronous declares that two (or more) clock domains are mutually asynchronous, so STA automatically false-paths every path between them, in both directions, including paths created later by clones or generated clocks of those masters.
The mistake: using individual set_false_path -from clkA -to clkB to handle a clock-domain crossing.
  • It only kills one direction — you must remember to also write the B-to-A statement, and people forget, leaving real CDC paths being (wrongly) timed and over-constrained.
  • It does not automatically cover new generated/derived clocks of A or B, so the exception silently goes stale as the design evolves.
For asynchronous domains, set_clock_groups -asynchronous is the correct, maintainable choice. Reserve set_false_path for genuinely untimed structural paths (e.g., a static config register, a mode pin, or a specific reconvergent path). Note: -logically_exclusive/-physically_exclusive groups are for clocks that never coexist (e.g., mux-selected), which is different from asynchronous.

Q113Explain <code>set_load</code> versus <code>set_driving_cell</code> (and <code>set_drive</code>). On which ports do they belong, and how does each affect timing?

Both model the missing electrical environment at chip boundaries so transitions and delays are realistic.
  • set_load applies a capacitance to a port (or net). On output ports it models the external load the chip must drive — larger load → slower output transition → larger driver delay, which tightens the output path and stresses set_max_transition at the pad. It belongs on outputs (and bidir).
  • set_driving_cell models the external driver feeding an input port by naming a real library cell (e.g., -lib_cell BUFX4 -pin Z). STA then computes a realistic input transition from that cell's drive strength into your input net's load. Without it, an input is treated as an ideal (zero/infinite-drive) source and the first-stage delay is unrealistic.
  • set_drive is the older, simpler form that just specifies an input drive resistance directly; set_driving_cell is preferred because it derives a load-dependent slew from actual library data.
Rule of thumb: driving_cell on inputs, load on outputs. Both directly change slews, and slew feeds delay and the max_transition/cap checks.

Q114What do <code>set_max_transition</code> and <code>set_max_capacitance</code> constrain, and how do they differ from a setup/hold timing check? Why are they design-rule checks?

They are Design Rule Constraints (DRCs), not path-delay (data-arrival) checks. They bound the electrical quality of every net regardless of whether the path's slack is positive.
  • set_max_transition caps the slew (rise/fall time) allowed on a pin/net. Slow transitions degrade delay accuracy, increase short-circuit/leakage-like power, and hurt downstream noise immunity.
  • set_max_capacitance caps the total load capacitance a driver may see, ensuring no cell is driving beyond what its library characterization supports.
Key differences from setup/hold:
  • DRCs are evaluated per-pin/per-net independently, not along a launch-to-capture path; there is no clock or period involved.
  • They are checked and fixed first (often by buffering/upsizing) because a max_transition violation makes the path's delay numbers themselves untrustworthy.
  • Library cells also carry their own intrinsic max_transition/max_capacitance; the tool uses the tightest of library limit and your SDC limit.
Priority order in fixing: DRC (max_tran/cap) → then setup → then hold (hold last so you do not re-break setup).

Q115You have a SerDes refclk that is divided by 2 on-chip to clock a register bank. Write the constraints and explain why <code>create_generated_clock</code> (not a second <code>create_clock</code>) is correct.

Constraints:
  • create_clock -name refclk -period 2 [get_ports REFCLK]
  • create_generated_clock -name clk_div2 -source [get_ports REFCLK] -divide_by 2 [get_pins div_reg/Q]
Why generated, not a new create_clock:
  • The divided clock is phase- and frequency-locked to refclk. create_generated_clock tells STA the exact relationship (period 4, edges derived from refclk), so paths between the refclk domain and the div2 domain are correctly treated as synchronous and timed.
  • A second create_clock would define an independent source. STA would then have to assume an arbitrary phase relationship (or you'd over-constrain), and the launch/capture edge alignment would be wrong — leading to false violations or, worse, missed real paths.
  • The generated clock inherits source latency and jitter from refclk through the common clock path, enabling CPPR credit on the shared segment.
Important detail: the -source should reference the master clock pin feeding the divider, and the generated-clock object is placed on the divider's output (Q) — the point where the new waveform physically appears.
A divide-by-2 generated clock stays phase-locked to its master refclk. — click to enlarge

Q116An input port has <code>set_input_delay -max 6 -clock vclk</code> with vclk period 10. Internal path from port to capture FF is 3 ns, FF setup 0.4 ns. Is setup met? Then explain how <code>-min</code> drives the hold check.

Setup check at the input: the available time inside the chip is the period minus what's consumed off-chip and the setup requirement.
required = T_clk - input_delay_max - T_setup = 10 - 6 - 0.4 = 3.6 ns
Internal path = 3.0 ns ≤ 3.6 ns, so slack = +0.6 ns → setup is MET.
Hold side: hold uses set_input_delay -min (the earliest the data can arrive). The check ensures early data does not change before the capture FF's hold window on the same edge:
internal_min_delay + input_delay_min ≥ T_hold (+ uncertainty_h)
So a large -min input delay actually helps hold (data arrives comfortably late relative to the launching edge it's referenced to), while a small/zero -min is the stress condition for hold. This is exactly why you must specify both -max (worst case for setup) and -min (best case for hold) — using one value for both either over-constrains setup or hides a hold problem.

Q117Senior-level: a constraint engineer sees a CDC path being over-constrained and full of violations after adding <code>set_clock_groups -asynchronous</code>. The violations persist. List the top causes and how SDC ordering/precedence resolves exception conflicts.

If async grouping didn't kill the paths, work through these:
  • Generated clocks not covered: the group lists masters but a divided/gated clock in that domain was defined after the group and not added. Re-issue or include derived clocks; generated clocks are not auto-inherited into an existing group in all flows.
  • A higher-precedence exception is missing/conflicting: SDC timing-exception priority is set_false_path > set_max_delay/set_min_delay > set_multicycle_path; clock groups generate false paths internally. A stray set_max_delay -from clkA -to clkB elsewhere can re-time a path you meant to cut, but generally false-path wins. Conversely an overly broad set_max_delay may be doing the over-constraining.
  • Same-clock conflict: grouping only cuts inter-group paths; if both registers are reached by the same clock (e.g., a shared generated clock), the path stays timed.
  • Wrong object: the group was applied to a port while the real timed clock is a generated clock on an internal pin.
Precedence rule of thumb (most to least specific wins among same type): -from -through -to > -from -to > -from alone; and across types, false_path overrides delay overrides multicycle. Debug with report_timing -from -to on the offending path and report_clock_groups / get_timing_exceptions to see what's actually applied.

11. Advanced & Signoff 11 questions

Q118What is MMMC (Multi-Mode Multi-Corner) analysis, and why is a single corner/mode no longer sufficient for signoff?

MMMC is the methodology of timing a design across the full cross-product of operating modes and PVT corners in a single analysis run.

Mode = a functional configuration that changes the constraint set: different clock frequencies, clock relationships, false/multicycle paths, or test vs functional (e.g. func, scan_shift, scan_capture, mbist, sleep). Each mode has its own SDC.

Corner = a fixed PVT + RC extraction condition: a library (SS/TT/FF), a voltage, a temperature, and a parasitic extraction corner (Cmax/Cmin/RCmax/RCmin).

A single corner is insufficient because:
  • Setup is worst at the slow corner (SS, low V, often high T) while hold is worst at the fast corner (FF, high V) — they need different libraries.
  • Different modes activate different paths and exceptions, so a path closed in functional mode may violate in scan-shift.
  • Temperature inversion means the true worst case isn't always the obvious extreme.
An MMMC scenario = one (mode, corner, check-type) tuple. Tools analyze all enabled scenarios concurrently so an ECO fixing one scenario doesn't silently break another.

Q119Map the standard library corners to the checks they bound. Which corner is worst for setup and which for hold, and why?

Setup (max-delay) is limited by the slowest data path, so it is analyzed at the slow corner: SS process, low voltage, and (classically) high temperature. Large delays make it hard to meet T_data_arrival ≤ T_clk - T_setup - uncertainty.

Hold (min-delay) is limited by the fastest data path, so it is analyzed at the fast corner: FF process, high voltage, and (classically) low temperature. Small delays cause the data to race through and change before the capture edge holds it: T_data_arrival ≥ T_hold + uncertainty at the capture clock.

Other notes:
  • OCV/derate is applied on top of these corners; you don't rely on the corner alone.
  • Parasitics pair with the libs: Cmax/RCmax (high cap, high R) hurts setup; Cmin/RCmin hurts hold.
  • At advanced nodes the high-T = slow assumption breaks — see temperature inversion.
Slow SS corner bounds setup; fast FF corner bounds hold. — click to enlarge

Q120What is temperature inversion, and how does it change the choice of worst-case timing corner?

Temperature inversion (a.k.a. inverted temperature dependence) is the effect at low-voltage advanced nodes where a gate becomes slower as temperature decreases, the opposite of the classic behavior.

Two competing effects set transistor drive current:
  • Carrier mobility falls as T rises → tends to make gates slower hot (classic).
  • Threshold voltage Vt falls as T rises → higher overdrive (Vgs−Vt) → tends to make gates faster hot.
At high supply voltage the mobility term dominates → hot = slow. At low voltage (near-threshold, where Vgs−Vt is small) the Vt term dominates → cold = slow.

Consequence: you can no longer assume hot is the setup-worst corner. The worst setup corner may be SS-low-V-low-T, and worst hold may shift to high-T. Foundries therefore characterize libraries at both temperature extremes (e.g. −40C and 125C) and MMMC must include them, because the true worst case is non-monotonic in T.

Q121Explain latch-based (level-sensitive) timing and time borrowing. What is the setup/hold equation for a latch, and why does borrowing not relax the cycle-level constraint?

A latch is transparent while its clock is at the active level and opaque otherwise. Unlike a flop (data captured only at the edge), data can pass through during the entire active window. If data arrives after the opening edge but before the closing edge, it still propagates — the latch borrows time from the next cycle's path.

  • Setup is checked against the closing edge, not the opening edge. Effective requirement: data must arrive before T_close - T_setup.
  • Max borrow ≈ the active-level duration minus setup: borrow_max = T_active - T_setup (a level-low/high latch can borrow up to roughly half the period for a 50% duty clock).
  • Hold is checked at the opening edge: T_arrival ≥ T_open + T_hold.
Why borrowing doesn't relax the overall constraint: time borrowed by stage N is given back — it eats into the budget of stage N+1. Borrowing only redistributes slack across pipeline stages to smooth uneven logic depth; the sum of delays across the loop of latches must still fit the total available time. If a path borrows, the downstream path has less time, and the analyzer propagates the borrowed amount forward. Excess borrow that can't be repaid shows up as a violation at the next latch.
Data arriving in the transparent window borrows from the next cycle. — click to enlarge

Q122What are recovery and removal checks on an asynchronous reset, and how do they relate to setup and hold?

Recovery/removal are timing checks on the deassertion (release) of an asynchronous control pin (async reset/preset) relative to the active clock edge. They guarantee the flop doesn't see reset changing too close to the clock, which would cause metastability on the captured value.

  • Recovery time = minimum time the reset must be deasserted before the active clock edge. It is the setup-analog for reset release: T_reset_release ≤ T_clk_edge - T_recovery. A recovery failure has a max-delay flavor (release arrives too late).
  • Removal time = minimum time the reset must remain asserted after the active clock edge before it deasserts. It is the hold-analog: T_reset_release ≥ T_clk_edge + T_removal. A removal failure has a min-delay flavor (release arrives too early/fast).
Key points: these checks apply only to the release edge (the async assertion itself is, by design, unclocked). They are most relevant in MMMC at the corners that stress the reset-path delay vs clock-path delay — recovery at the slow corner, removal at the fast corner — exactly mirroring setup/hold.
Recovery = release before the clock edge (setup-like); removal = stay asserted after (hold-like). — click to enlarge

Q123What is a minimum pulse width check, and what are the clock-gating setup/hold checks? When do they appear?

Minimum pulse width (MPW) verifies that a clock's high phase and low phase each remain wide enough for the cell to operate. Every sequential and clock cell has a library-specified min_pulse_width for high and low; if a glitch or a too-narrow gated pulse violates it, the flop may not capture/reset reliably. STA checks high_pulse ≥ MPW_high and low_pulse ≥ MPW_low at the relevant corner (worst at the corner where the cell needs the widest pulse). It also covers minimum period on the clock source.

Clock-gating checks verify the enable signal at an AND/OR clock gate changes only while the gate output is in its inactive level, so no partial/glitch clock pulse is created:
  • For an AND gate (clock passes when enable=1): the enable must be stable before the clock rising edge (gating setup) and held until after the clock falls (gating hold). This prevents chopping a clock pulse.
  • For an OR gate the polarity is mirrored (enable stable around the falling edge).
These appear automatically wherever the tool infers a clock gate (latch-based ICG cells or combinational gating). MPW checks appear on all clock pins and are essential after clock gating, DCDL/divider logic, or any place a pulse can be narrowed.

Q124What is a data-to-data check (set_data_check), and give a realistic scenario where you'd use one.

A data-to-data check (set_data_check) enforces a setup and/or hold timing relationship between two data signals that is not referenced to a clock edge. It models a required arrival ordering between two pins, treating one as the 'constrained' pin and the other as the 'related/reference' pin.

set_data_check -from related_pin -to constrained_pin -setup <value> means the constrained signal must arrive at least value before the related signal; the -hold form constrains it to arrive after.

Realistic uses:
  • Custom/async memory or register-file interfaces where an address must be stable a fixed time before a write-enable / write-strobe (an internal setup requirement with no clock at that node).
  • Two-phase or self-timed handshakes where one control must lead another.
  • Mux select vs mux data requirements, or asynchronous FIFO control where pointers must settle before a strobe.
It is the right tool whenever the relationship is signal-to-signal rather than launch-clock-to-capture-clock; using a regular setup/hold check would be wrong because there is no clock to reference.

Q125Why is PrimeTime (path-based, signoff) STA used for signoff instead of relying on the in-tool STA inside synthesis or P&R? What does PrimeTime add?

In-tool STA (the timer inside DC/Genus or Innovus/ICC2) is a fast, approximate engine optimized for driving optimization loops, not for golden signoff. PrimeTime (PT) is the foundry-correlated golden signoff timer. Differences that matter:
  • Delay accuracy: PT uses signoff delay calculation (e.g. CCS/ECSM current-source models, SI-aware) correlated to SPICE; P&R timers use simplified/abstracted models for speed.
  • Signal integrity: PT-SI does rigorous crosstalk delay and noise (glitch) analysis with iterative aggressor/victim alignment; in-tool SI is lighter.
  • Parasitics: PT reads the signoff-extracted SPEF (e.g. StarRC) per corner; the P&R tool may use its own estimate.
  • POCV/AOCV/derating and statistical signoff (parametric on-chip variation) are applied consistently in PT.
  • Path-based analysis (PBA) vs graph-based (GBA): GBA is pessimistic (worst slew/derate at every pin); PT can re-time the actual failing path with realistic per-path slew propagation, recovering false pessimism and often closing 'violations' that don't really exist.
So flow is: optimize in P&R, then sign off in PT (and fix the residual via timing ECO). Tapeout decisions are made on PT numbers because they are what correlate to silicon.

Q126Walk through how a timing ECO works in the signoff flow. What is the difference between a setup ECO and a hold ECO, and why are hold fixes done last?

A timing ECO (Engineering Change Order) is a small, targeted netlist patch applied after place-and-route to close residual timing found in signoff (PrimeTime), without re-running full synthesis/placement.

Typical loop:
  • PT-ECO analyzes violating paths and emits fix guidance (fix_eco_timing / fix_eco_drv), producing an ECO change list.
  • The list is implemented in the P&R tool (ECO place + ECO route), legalized, re-extracted (SPEF), and re-signed-off in PT.
  • Iterate until clean across all MMMC scenarios.
Setup ECO (path too slow): upsize cells for better drive, restructure logic, swap to lower-Vt cells, add buffers to fix transition, or balance the clock. These changes alter delays of real logic and can move many paths.

Hold ECO (path too fast): insert delay/buffer cells or use higher-Vt cells to add delay on the short path.

Why hold fixes go last: setup fixes change cell sizes, buffering, and clock structure, which in turn change path delays and can create new hold violations. If you fixed hold first, subsequent setup work would invalidate it. Also hold is corner-specific (fast corner) and must be verified across all scenarios, so it is converged after the setup-driven structure has stabilized — ideally without harming setup, since added delay buffers eat setup slack.

Q127A latch-based path borrows 200ps at latch L1. The clock period is 1ns, L1 is active-high with 50% duty, T_setup = 50ps. Is 200ps of borrow legal, and what happens to the path launched from L1?

Is the borrow legal? The maximum a 50%-duty active-high latch can borrow is the transparent (active) window minus setup:
borrow_max = T_active - T_setup = (0.5 × 1ns) - 50ps = 500ps - 50ps = 450ps.
Since 200ps ≤ 450ps, the borrow is legal — L1 stays transparent long enough to pass the late-arriving data.

Effect on the downstream path (L1 → next latch L2): the 200ps borrowed by the incoming path is charged to L1's launch. The data effectively launches from L1 200ps later than the nominal opening edge. So the path from L1 to L2 must now complete in T_available - 200ps. Borrowing is a zero-sum redistribution: it relaxes the tight upstream stage by tightening the next stage.

Interview punchline: borrowing is fine and even useful for absorbing logic-depth imbalance, but you must confirm the downstream stage still meets timing after the borrow is propagated. If L1→L2 can't absorb the 200ps, you get a setup violation at L2 even though L1 itself reported borrow within limits.
Borrow legal (200ps < 450ps) but the L1-to-L2 stage loses 200ps of budget. — click to enlarge

Q128In MMMC, you fixed a setup violation in the functional mode at the SS corner and the path is now clean there — but tapeout is blocked. Name the most common reasons a per-scenario fix doesn't translate to a clean signoff.

A clean fix in one scenario is necessary but not sufficient. Common reasons signoff is still blocked:
  • Hold regressed at the fast corner. The setup fix (upsizing, lower-Vt, buffering, clock rebalancing) sped up or reshaped paths and created FF-corner hold violations — these must be checked across all scenarios, not just the one you fixed.
  • Another mode broke. The same physical path is timed differently in scan_shift/capture or a low-power mode; an exception or different clock there now fails.
  • Other corners. Temperature inversion or a different RC corner (Cmin vs Cmax) makes a different corner the true worst case for that path.
  • DRV / SI not clean. The ECO improved arrival time but left a max_transition/max_cap or a crosstalk-induced delta-delay/noise violation.
  • GBA vs PBA mismatch: a GBA fix that looked done may still be flagged, or the real path under PBA needs different handling.
  • min_pulse_width / clock-gating checks or recovery/removal on resets still open after the clock structure changed.
The discipline: in MMMC you must converge all enabled (mode × corner × check) scenarios concurrently, and re-run the full signoff (delays, SI, DRV, async checks) after every ECO — never sign off on a single scenario.

Want to go deeper than Q&A?

Work through the full Static Timing Analysis learning path, or practice these in a live mock interview with feedback.