PCFG Attack — Comprehensive Operation

Project reference document Time2Crack
Recipients: developers, security researchers, advanced users

Contents

  • Overview
  • Historical and academic background
  • Foundations: structural grammar of passwords
  • Learning the PCFG model
  • Orderly generation of candidates
  • Why PCFG breaks secrets with high apparent entropy
  • Implementation in Time2Crack: addPCFGAttacks()
  • Function pcfgKeyspace() and assumptions
  • Digital management: soft cap (replacement of hard cap)
  • High fidelity calibration
  • Benchmarks and orders of magnitude
  • Concrete examples
  • Limits of PCFG Attack
  • Effective defences
  • References

  • 1. Overview

    PCFG (Probabilistic Context-Free Grammar) models the password structure in typed segments (letters, numbers, symbols), and generates the most likely candidates in priority.

    2. Historical and academic background

    The work of Weir et al. (2009) established PCFG as a major probabilistic cracking method, often superior to purely dictionary approaches with equal budget on human corpus.

    3. Foundations: structural grammar of passwords

    Example:

    The model learns that some patterns (L8D2, L6D4) are common, other rare.

    4. Learning the PCFG model

  • Segment passwords into classes.
  • Estimate the probability of skeletons.
  • Estimate the probability of tokens in each slot.
  • 5. Orderly generation of candidates

    PCFG first produces high probability bypasses. This prioritization is the main source of its effectiveness.

    6. Why PCFG breaks secrets with high apparent entropy

    A chain may seem strong in raw entropy while being very predictable structurally (word + digits + symbol). PCFG exploits this predictability.

    7. Implementation in Time2Crack: addPCFGAttacks()

    Time2Crack calculates a budget via pcfgKeyspace(pw) then converts to time with budgetTime(...).

    Category: cat: "pcfg", note: nPCFGDetected.

    8. Function pcfgKeyspace() and assumptions

    The internal model approximates the lexical, numerical and symbolic dimensions, and then limits the keyspace to remain realistic in interactive use.

    Main components:

  • letter component (wordGuesses),
  • Numbers component (10^digitLen),
  • symbol component (32^symbolCount),
  • structural variation factor.
  • The PCFG budget is then converted to time via budgetTime(pcfgGuesses, rate).

    9. Digital management: soft cap (replacement of hard cap)

    Time2Crack now uses a soft cap to avoid artificial trays caused by a single hard heading.

    9.1 Former approach (hard copy)

    One Math.min(..., cap) crushes all the values above the heading in the same constant. This removes the hierarchy between "difficult" and "very difficult" cases.

    9.2 New approach (soft course)

    The model applies continuous compression:

  • linear under a elbow point (knee),
  • progressive compression above,
  • asymptote to a numerical maximum (max).
  • Formula used:

    soft = knee + (max - knee) * (1 - exp(-(raw - knee)/(max - knee))) for raw > knee.

    Otherwise soft = raw.

    9.3 Current parameters (app.js)

  • PCFGSOFTCAPKNEE = 1e14
  • PCFGMAXGUESSES = 1e18
  • Effect: Digital stability preserved without abrupt breakage or total loss of differentiation.

    10. High fidelity calibration

    With PCFG v2, structural signals are already integrated into the rank estimate (pcfgKeyspaceThe HF adjustment is therefore neutral to avoid double counting.

    11. Benchmarks and orders of magnitude

    On hash fast, the current structures fall very quickly.

    On slow KDF, the PCFG order remains advantageous but the cost per attempt remains decisive.

    12. Practical examples

  • Password123 : ideal target PCFG.
  • xQ7$vP2!mL9@ : low PCFG compatibility.
  • 13. Limits of the PCFG Attack

  • addiction to training corpus,
  • low performance on real random,
  • complexity of multi-domain linguistic calibration,
  • the v1 of the model remains an approximation of PCFG (not an online driven grammar).
  • 14. Effective defences

  • Avoid the usual structures (Word+Digits+Symbol).
  • Use randomly generated secrets.
  • Solid KDF + MFA.
  • 15. Bibliographic references

  • Weir et al. (2009). IEEE S&P.
  • Ma et al. (2014). IEEE S&P.
  • Wheeler, D. (2016). USENIX Security.