Markov attack (probabilistic) — Comprehensive operation

Project reference document Time2Crack
Recipients: developers, security researchers, advanced users

Contents

  • Overview
  • Historical and academic background
  • Mathematical foundations: Markov's chains
  • Architecture of a Markov cracktool
  • Phase 1 — corpus training of real passwords
  • Phase 2 — Orderly generation of candidates
  • Reducing Research Space: The Key to Effectiveness
  • Notable variations and implementations
  • Behaviour against different types of passwords
  • Integration into Time2Crack: Modeling and Parameters
  • Comparison with other probabilistic attacks
  • Limits of the Markov attack
  • Effective defences
  • References

  • 1. Overview

    Markov Attack is a password cracking method probabilistic and statistical which generates candidates by relying on the observed frequencies of transition between characters in real human passwords. Instead of testing all possible combinations in alphabetical order — as the raw force does — it sorts candidates from the most probable at least, concentrating the calculation effort where human passwords are actually located.

    In simple terms : an attacker who has enough GPU and a well trained Markov model will test "letmein", "p@ssword" and "sun2024" before testing "aaaaaa", "aaaaab" or "xqzfkw" — because humans create passwords that look like words, not white noise. Measured efficiency : Dürmuth et al. (2015) research on the RockYou corpus (14 million passwords) shows that a Markov 5 model reduces the actual search space to 0.6% of gross space for passwords with a typical human structure — a 99.4 % reduction.

    2. Historical and academic background

    2.1 Origins

    The application of Markov's channels to password cracking is earlier than formal academic publications. Offensive security practitioners used similar models since the 2000s, but rigorous formalization dates mainly from the 2010s.

    Chronology of milestones : YearEvent ------------------- 2005Nakashima & Oyama: first formal application of n-grams to passwords 2009Weir et al. (IEEE S&P): PCFG — direct competitor, positions structural probabilities 2012Ma et al. (USENIX Security): Comparative study of probabilistic models on 70M passwords 2013Durmuth: first OMEN analyses on several corpus 2015Dürmuth, Angelstorf, Horsch, Nürnberger, Rüegge: OMEN (ESORICS 2015) — major academic reference 2016Wheeler (zxcvbn) confirms the effectiveness of n-grams in estimating actual entropy 2018OMEN+: improved version with smoothing and interpolation 2019–2022Markov hybrid work + neural networks (PassGAN, FuzzyPSM)

    2.2 Reference body

    Markov model training is based on password databases from known data leaks:

  • LinkedIn (2012) : 6.5 million non-salted SHA-1 hashs (including 4,4M cracked)
  • Adobe (2013) : 153 million identifiers (3DES hashs — cracked by frequency analysis)
  • Collection #1–5 (2019) : 2.7 billion pairs identifying/password
  • Have I Been Pwned (2024) : ~14 billion single aggregate credentials
  • These corpus reveal the actual statistical distribution of human passwords, which differs radically from a uniform distribution.


    3. Mathematical Foundations: Markov's Chains

    3.1 Formal definition

    Markov chain of order k on an alphabet k Previous characters:

    P(cₙ | c₁c₂...cₙ₋₁) = P(cₙ | cₙ₋ₖ...cₙ₋₁)

    For passwords, it usually includes 95 printable ASCII characters (26 lower case + 26 upper case + 10 digits + 33 symbols).

    Order 1 (bigram) :
    P("password") = P("p") × P("a"|"p") × P("s"|"a") × P("s"|"s") × ...
    Order 5 (optimal according to Dürmuth 2015) :
    P("password") = P("p") × P("a"|"p") × P("s"|"pa") × P("s"|"pas")
                    × P("w"|"pass") × P("o"|"passw") × P("r"|"passwo")
                    × P("d"|"passwor")

    Order 5 captures contexts like "after "passw", the most likely follow-up is "o" — information inaccessible to an order model 1.

    3.2 The transition matrix

    For an order of 1 over 95 characters, the transition matrix is size 95 × 95 = 9,025 inputs. For Order 5, it goes to 955 × 95 = ~7.7 × 1010 theoretical inputs — but the majority of order n-grams 5 never appear in the actual data. Implementations use hollow data structures (tries, hash maps) to store only observed n-grams.

    3.3 Log probability and sorting

    To avoid floating overruns on 8+-character passwords (probabilities of the order of 10−20), the tools work in log probabilities :

    log P("password") = log P("p") + log P("a"|"p") + log P("s"|"a") + ...

    This value is the score Markov Candidates are generated in descending order of score, i.e. the most likely at least likely.

    3.4 Why Order 5?

    Empirical research (Dürmuth 2015, confirmed by Ma et al. 2012) shows that:

  • Order 1 : captures frequencies of individual characters (e, a, o common in tiny), reduction ~50%
  • Order 2 : captures common bigrams (pa, ss, wo, rd), reduction ~75%
  • Order 3 : captures trigrams (not, ass, ssw), reduction ~90%
  • Order 4 : diminishing returns start, reduction ~95 %
  • Order 5 : Optimum empirical on RockYou, reduction ~99.4%
  • Order 6+ : overfitting, model memorizes training passwords without generalizing

  • 4. Architecture of a Markov cracktool

    4.1 Main components

    ┌─────────────────────────────────────────────────────────────┐
    │                    OUTIL DE CRACKING MARKOV                 │
    ├─────────────────┬───────────────────┬───────────────────────┤
    │  Phase          │  Composant        │  Rôle                 │
    ├─────────────────┼───────────────────┼───────────────────────┤
    │  Entraînement   │ Corpus Parser     │ Lit les mots de passe │
    │                 │ N-gram Counter    │ Compte les transitions│
    │                 │ Probability Table │ Normalise en probas   │
    │                 │ Model Serializer  │ Sauvegarde le modèle  │
    ├─────────────────┼───────────────────┼───────────────────────┤
    │  Génération     │ Candidate Gen     │ Énumère par score     │
    │                 │ Length Controller │ Filtre par longueur   │
    │                 │ Threshold Filter  │ Coupe les improbables │
    ├─────────────────┼───────────────────┼───────────────────────┤
    │  Attaque        │ Hash Engine       │ Compare aux cibles    │
    │                 │ GPU Interface     │ Hashcat backend       │
    └─────────────────┴───────────────────┴───────────────────────┘

    4.2 Integration with Hashcat

    The most widespread implementation in practice is via the Hashcat stdin mode (-a 0 with an external generator) or via Python modules as statsgen (from the PACK toolkit):

    # Exemple d'attaque Markov pratique
    python statsgen.py rockyou.txt --output markovmodel.stats
    python maskgen.py markovmodel.stats | hashcat -a 0 -m 1000 targethashes.txt

    Dedicated tools like OMEN (Open Markov Estimator of Next passwords) directly generate lists of sorted candidates.


    5. Phase 1 — Body of real password training

    5.1. Extraction of n-grams

    For each password in the corpus, all n-grams are extracted (with a start marker). ^ and end $) :

    Example with "hello" (order 2):

    ^h, he, el, ll, lo, o$

    Example with "P@ss1!" (order 2):

    ^P, P@, @s, ss, s1, 1!, !$

    5.2 Counting and standardization

    For each context (n-1 previous characters), the occurrences of each following character are counted and normalized:

    Contexte "pa" → {s: 847, r: 12, t: 8, l: 3, ...}
    → P(s|pa) = 847/870 = 0.973
    → P(r|pa) = 12/870 = 0.014

    These distributions reveal human language habits: after "pa", people write massively "s" (password, pass, pablo...).

    Smoothing

    The problem of unobserved n-grams: if "xq" never appears in the corpus, P(c)xq) = 0 for any c — which blocks the generation. The most used smoothing techniques:

  • Add-one (Laplace) : add 1 to all accounts
  • Good-Turing : redistribute the probability of the n-grams seen once to those never seen
  • Kneser-Ney : state of the art method, used in OMEN+
  • Smoothing also allows the model to generate candidates out of the training corpus — crucial to cover slightly new passwords.

    5.4 What the model learns

    After training on RockYou, a Markov 5-order model internalizes patterns like:

  • Digital suffixes : numbers appear massively at the end of the password (→ "123", "2024", "1!")
  • Current beginnings : "pa", "my", "lo", "ch", "123", "abc" are frequent beginnings
  • Initial payment : the uppercase in the first position is much more likely than in the middle of the chain
  • Rare symbols in the middle : "@", "#", "!" appear mainly in replacement of letters or suffix
  • Human Repetitions : "aa", "ll", "ee" are more frequent than duplicates of rare consonants

  • 6. Phase 2 — Orderly generation of candidates

    6.1 The problem of score enumeration

    Generating candidates in descending order of probability Markov is a non-trivial problem. We cannot simply calculate the probabilities of all possible candidates (948 x 6 × 1015 for 8 tanks) and then sort them out.

    The solution: Heuristic search with priority tail The algorithm is similar to Dijkstra:

  • Initialize the line with the most likely beginnings (^a, ^p, ^1...)
  • At each stage, depilate the most likely partial candidate
  • Extend it with each possible character, calculate the score of the extended candidate
  • If target length reached, issue candidate
  • If not, file back
  • This approach ensures an approximately decreasing sorting without memorizing the entire space.

    6.2. Threshold threshold

    To avoid generating billions of improbable candidates, a Minimum score below which a partial candidate is eliminated (sladding).

    Si log P(candidatpartiel) < THRESHOLD → abandon de cette branche

    Threshold is a critical parameter: too high → you miss passwords at the limit; too low → you generate too many candidates and slow down the attack.

    6.3 Length management

    Markov models are trained and generally operate by fixed lengthA typical attack:

  • Length 6: generate N candidates in descending order of probability
  • Length 7 : Idem
  • Length 8: Idem (most common length in corpus)
  • Lengths 9-12: decreasing coverage
  • Length distribution in RockYou: 8 tanks = 24%, 7 tanks = 21%, 9 tanks = 12%, 6 tanks = 10%...


    7. Reducing Research Space: The Key to Effectiveness

    7.1 Empirical quantification (Dürmuth 2015)

    For a password of 8 characters with the complete alphabet (94 tanks):

    MethodResearch spaceReduction factor ---------------------------------------------------- Gross strength948 6 × 10151× (reference) Markov Order 1~3 × 1015~2× Markov Order 3~6 × 1013~100× Markov Order 5~3.6 × 1010~166 000× Markov Order 5 (typical human)~4 × 107~150 million×

    The difference between "reduced space" and "typical human space" can be explained by the fact that real human passwords are concentrated in top-k% most likely Depending on the model.

    7.2 Why Humans Are Predictable

    The predictability factors used by Markov:

    Language constraints passwords are derived from existing words. "letmein", "sunshine", "iloveyou" follow phonotic rules of English — the transitions of letters are highly probable. Cognitive constraints Humans do not remember really random sequences. A random password such as "xqzfw9@" is rarely chosen spontaneously. Cultural patterns These patterns create high density clusters in the password space. Predictable substitutions These changes are so common that they form part of the model. Position of numbers and symbols : the numbers at the end of the password (passwd123) are 10× more frequent than the numbers at the beginning. Suffix symbols (passwd!) are 5× more frequent than in the central position.

    7.3 The case of keyboard patterns

    Keyboard sequences (qwerty, asdf, zxcv, 1234, azerty...) are particularly vulnerable.

  • Character transitions follow the constraints of keyboard topology
  • From each key, only 4-8 immediate neighbours are relevant
  • A Markov model driven on corpus including these patterns achieves a reduction of 99.7 % (vs 99.4% for general human words)
  • In Time2Crack (current version), this distinction is no longer a fixed bearing. The Markov v2 model incorporates kbPat as a signal in a rank estimator (markovExpectedGuesses) then interpole continuously between "human-like" and "random-like" regimes.


    8. Notable Variations and Implementations

    8.1 OMEN (Open Markov Estimator of Next passwords)

    Developed by : Dürmuth, Angelstorf, Horsch, Nürnberger, Rüegge (ESORICS 2015) Availability : Open source (GitHub) Main innovation : optimized scheduling by level score (summarization by probability steps)

    OMEN uses a discretisation of probabilities in whole levels to accelerate the generation:

  • Each n-gram receives an integer score (0 = very likely, 10 = unlikely)
  • The sum of the scores of a password is its overall score
  • Passwords are generated in ascending order of overall score
  • This approach elegantly solves the problem of orderly enumeration without expensive priority tail.

    8.2 OMEN+

    Extension of OMEN with:

  • Smoothing Kneser-Ney for better generalisation
  • Multi-order interpolation : combines order predictions 3, 4 and 5
  • Adaptation of threshold based on empirical corpus distribution
  • Reported performance: +15 to +20% cracked passwords vs original OMEN with equal budget.

    8.3 Statsgen / PACK (Password Analysis and Cracking Kit)

    Developed by : Peter Kacherginsky (iphelix) Approach : lighter than OMEN, generates Hashcat compatible masks and stats Typical use : analysis of a set of cracked passwords to adapt future attacks

    8.4 JohnTheRipper — Markov Mode

    John TheRipper integrates a native Markov mode (developed by Samuele Giovanni Tonon):

    john --markov --min-length=6 --max-length=8 hashes.txt

    Parameter --markov-threshold (0-400): controls the cut-off threshold. 150 = recommended standard parameter.

    JtR Markov mode uses pre-calculated statistics stored in files .mkv (markov stats files) generated by calcstat.

    8.5 Markov Hybrid Models

    Markov + rules : generate Markov candidates, then apply Hashcat transformation rules Markov + dictionary : use Markov passwords as a basis for hybrid attacks Markov + PCFG : combine the probabilistic order Markov with the structural stress PCFG

    These hybridizations are used in professional cracking configurations to maximize CPU/GPU budget coverage.

    8.6 Modern Language Models (LLM/LSTM) — successors to Markov

    LSTM and Transformer networks can be seen as generalizations of Markov channels:

  • Markov order k = k tokens fixed context window
  • LSTM = variable and unbounded context window (long-term memory)
  • Transform = overall attention to the entire sequence
  • PassGAN (2017), FLA (2019), PassBERT (2022) surpassed classic Markov models by 20-30% in crack rates on modern corpus.


    9. Behaviour against different types of passwords

    9.1 Passwords from natural words

    Examples : sunshine, letmein, dragon, iloveyou Vulnerability : extreme Mechanism : the model assigns very high probability to the sequences of the current English. These words appear in the top 0.001 % generated candidates. Time : milliseconds on MD5/SHA-1

    9.2 Words with leetspeak substitutions

    Examples : p@ssw0rd, L3tm3In, $unsh1ne Vulnerability : high Mechanism The model has learned that "@" often follows "p" (because "p@ss..." is very common in the corpus), that "0" often follows "w" (w0rd...), etc. The most common substitutions are integrated into the probability of transition. Time : seconds to minutes

    9.3 Words with digital suffix

    Examples : password123, sunshine2024, dragon1! Vulnerability : high Mechanism The numerical suffixes are so common that the model has a high probability for the letter→ transitions at the end of the sequence. "123", "1", "2024", "12" are the most likely suffixes. Time : seconds to a few minutes

    9.4 Given names and proper names

    Examples : Thomas92, AnneM@rie, NICOLAS Vulnerability moderate to high Mechanism : Common names (in English) are well represented in the reference corpus. Less common names in English corpus (French, Polish, etc.) benefit less from the model. Time : minutes to hours depending on the popularity of the first name

    9.5 Pass words of apparent random structure but mnemotechnic

    Examples : Tr0ub4dor&3 (of article XKCD), Il0v3myC@t! Vulnerability moderate Mechanism These passwords have a human structure (baseword + substituions + suffix), but the base and the specific transformations are not in the top-k of Markov candidates. The model will eventually reach them, but they fall into the 10-30% of space. Time : hours to days according to length and hash algorithm

    9.6 Keyboard Patterns

    Examples : qwerty, 1234567, azerty, qweasdzxc Vulnerability : extreme (worse than natural words) Mechanism : keyboard transitions are so represented in the corpus that the model ranks them among the first thousands of candidates — sometimes the first tens. Time : microseconds to milliseconds

    9.7 Really Random Passwords

    Examples : xQz7@mK9, 4#pL$2nR, kRf9!Ws3 Vulnerability : low to zero Mechanism : The character transitions in these passwords have very low probabilities in a model trained on human passwords. They are found in the last percentages of the space generated. Behaviour The real "cost" of security is gross entropy. Time : comparable to or greater than gross force

    9.8 Passphrases

    Examples : horse-battery-staple-correct, monchieneestlouisXIV Vulnerability : variable Complex mechanism :
  • A Markov model can capture inter-word transitions in a short passphrase (2 words) if these combinations are common in the corpus
  • But for long passphrases (4+ words), the candidate space ordered by Markov will only reach them after exhausting all the typical short passwords
  • Attack combinator is generally more effective than Markov on passphrases

  • 10. Integration into Time2Crack: Modelling and Parameters

    General logic of calculation (v2)

    Time2Crack now models Markov in expected rank (rank-based) rather than with fixed reduction factors.

    Calculation line:

  • Unicode tokenization in segments L/D/S,
  • estimation of a human rank via skeleton + shape + rows of tokens,
  • modulation by structural signals (kbPat, seq, dt, rep, hybridVuln),
  • continuous interpolation towards a "random-like" regime according to charset and length,
  • time conversion via budgetTime(rangestime, vitessehash).
  • 10.2 Unicode Tokenization and Calibrated Profile by Language

    The Markov v2 model uses Unicode classes (\p{L}, \p{N}, otherwise symbol) and a calibrated profile per language (loaded since data/markov-calibration.json, with internal fallback).

    The profile contains:

  • skeletonRanks (L8D2, L8D2S1, ...),
  • shapeRanks (LD, LDS, ...),
  • tokenRanks by class/length,
  • language multipliers (shapeMultipliers),
  • signal multipliers (signalMultipliers).
  • 10.3. Continuous interpolation (end of steep thresholds)

    The old model used bearings (0.003 / 0.006 / 0.25 / 0.9). The v2 model replaces this logic with continuous interpolation between:

  • humanRank (probable structural area),
  • randomRank (area close to crude, bounded).
  • Random weight is a continuous function of the length, tank and strength of human signals, thus avoiding artificial discontinuities.

    10.4 Confidence and applicability score

    Time2Crack Scoring Model (function annotateScenarioApplicability()) allocates to Markov:

    Basic confidence : 0.75 (line 4800) Interpretation: Markov attack is considered moderately reliable — more reliable than the neural estimate (0.7) or PRINCE (0.7), less than the dictionary (0.9) or the groove tables (0.95). Applicability score (line 4873-4874):
    case "markov":
      return context.looksHuman ? 0.76 : 0.34;
  • looksHuman = true → score 0.76: Markov attack is very applicable, the model will exploit human regularities
  • looksHuman = false → score 0.34: applicable but without major advantage over brute force
  • Evidence score (line 4829-4830):
    case "markov":
      return Number(!!context.looksHuman) + Number(!!context.kbPat)
           + Number(!!context.seq)        + Number(!!context.rep);
    Each signal of "human character" of the password increments Markov's obvious score.

    10.5 Adjustment HF

    With Markov v2, the structural signals are already integrated into the estimated rank. The HF adjustment is therefore neutral for Markov in order to avoid double counting.

    10.6 Priority in the tiebreak

    When multiple attacks give similar times (< 1ms of difference), Time2Crack chooses the most relevant according to a semantic priority:

    const tieBreakPriority = {
      // ...
      markov: 8,    // Statistical n-gram — broad coverage
      // ...
    };

    Markov is a useful complement attack that exploits statistical regularities. It takes precedence over gross force for passwords with human structure, but gives precedence to dictionary and hybrid attacks that are more direct.

    10.7 Ultra-weak case

    For ultra-low passwords (detected by isWeakPassword()), Markov uses a time based on a minimum rank (weakGuessTime) rather than a fixed constant:

    if (weak) {
      rows.push({ ..., sec: weakGuessTime(a.rate), note: t("nWeakPassword"), cat: "markov" });
    }

    Reasoning: an ultra-low password (password, 123456, qwerty...) is in the top-100 absolute Markov space — it will be tested first, the accuracy of the calculation is irrelevant.


    11. Comparison with other probabilistic attacks

    11.1 Markov vs PCFG

    DimensionMarkovPCFG -------------------------- Unit of analysisIndividual character transitionsSegment structures (L/D/S) Information capturedLocal Regularities (n-gram)Global (grammar) Force onNatural words, continuous patternsWords + numbers, composite structures LimitDoes not capture global structuresDo not capture local transitions ComplementarityIdeal for pure wordsIdeal for Password123 Optimal example"sunshine"Sunshine#93 Order in Time2CrackMarkov = 8, PCFG = 6PCFG more accurate if structure detected

    11.2 Markov vs. Rule-Based Attack

    DimensionMarkovRule-based --------------------------------- ApproachGeneration ab initio probabilisticTransformation of an existing dictionary CoverageAll likely passwordsTranslations of dictionary words Force onRare or slightly non-dictionary wordsTraditional transformations of known words LimitLess effective than dictator+rules for current mutationsDepends entirely on the quality of the dictionary

    11.3 Markov vs Neural Networks (PassGAN, LSTM)

    DimensionMarkov (Order 5)Neuronal network ---------------------------------------------- ComplexityO(k ×) by predictionO(n2 × d) (Transformer) Data required1-10M examples sufficient10M-1B for good results InterpretabilityTotal (probability tables)Black box GeneralisationLimited to local context kOverall potential Crack rateReference+20-30% on modern passwords Generation speedVery fastSlower

    11.4 Global positioning

    In a professional cracking workflow, the typical order is:

  • Dictionary (credlist) — known exact passwords
  • Credential stuffing — re-used ID/mdp pairs
  • Password spraying — top-20 common
  • Hybrid — dico + Hashcat rules
  • PCFG — if L+D structure detected
  • Mask — if positional pattern detected
  • Markov — general statistical coverage
  • Neural — if available budget
  • Gross strength — last resort
  • Markov is often the background line front brute force: it effectively covers human passwords that have escaped more targeted attacks.


    12. Limits of the Markov Attack

    12.1 Dependence on the body of training

    The model is as good as its corpus. A corpus dominated by English passwords (RockYou, LinkedIn) misunderstands regularities:

  • Passwords in high morphological density languages (Polish, Czech, Finnish)
  • Passwords based on unrepresented cultures (transliterated mandarin, Romanized Arabic)
  • Passwords from specialized fields absent from leaks (medical slang, rare technical jargon)
  • Partial solution : to train specific models by language and field (approach requiring diverse corpus).

    12.2 Long-range interpositional independence

    A Markov 5 model sees no more than 5 characters. For a password of 12 characters, it cannot correlate the 12th character with the 1st. A passphrase like "stargate2004!" has a global structure ("word + year +!") that Markov perceives locally but not globally — PCFG is better here.

    12.3 Insensitivity to compositional structures

    "HorseStapleBattery" (initial majusculum + 3 concatenated words) is difficult for Markov because inter-word transitions (e→S, e→B) are unexpected. The combinator attack or PCFG is better suited.

    12.4 Random password ineffectiveness

    A password generated by /dev/urandom with 12+ alphanumeric characters has no structure exploitable by Markov. The attack time Markov converges to the raw force.

    12.5 Sensitivity to updated password policy

    If an organization has imposed strict rules (length ≥ 16, mandatory 1 capital + 1 digit + 1 symbol), the resulting passwords may have a different distribution of the training corpus — reducing the effectiveness of the model.


    13. Effective defences

    13.1 What completely neutralizes Markov

    Random password generators : any password manager (Bitwarden, 1Password, KeePass) with a cryptographically secure generator produces sequences without structure exploitable by Markov. A password of 16 random characters is out of reach regardless of the attack. Enough length with random : even with a reduced tank (a-z only), 16 random characters represent 26^16

    13.2 Which greatly reduces the effectiveness of Markov

    Long passphrases of rare words : 4+ uncommon words separated are out of the top Markov candidates, especially if words come from different languages or specialized vocabularies. Avoid common leetspeak substitutes : Substituions a→@, e→3, etc. are integrated into the model, but non-standard transformations (a→ Avoid keyboard patterns and natural words : Markov's maximum vulnerability.

    13.3 What does not defend against Markov

    Complexity requirements (majuscule + digit + symbol) : if these elements are added to a natural word (P@ssw0rd!), the model captures these patterns. false security perceived. Basic leetspeak substitutions : fully captured by the model (transitions "@"→"s", "3"→"y"...). Digital Suffixes/prefixes : "word + 123" or "word + 2024" patterns are among the first tested.

    13.4 Final recommendation

    The most effective defence against Markov (and any probabilistic attack) is toeliminate any foreseeable human structure, which amounts to delegating password creation to a secure random generator.


    14. References

    Academics (peer-reviewed)

    Dürmuth, M., Angelstorf, F., Horsch, J., Nürnberger, S., & Rüegge, A. (2015). OMEN: Faster Password Guessing Using an Ordered Markov Enumerator. Engineering Secure Software and Systems (ESSoS 2015), LNCS 8978, pp. 119–132. → Main reference for Markov probabilistic calibration used in Time2Crack (RockYou 14M, Order 5) Ma, J., Yang, W., Luo, M., & Li, N. (2014). A Study of Probabilistic Password Models. IEEE Symposium on Security and Privacy (S&P 2014). → Empirical comparison Markov, PCFG, n-grams on 70M passwords Weir, M., Aggarwal, S., de Medeiros, B., & Glodek, B. (2009). Password Tracking Using Probabilistic Context-Free Grammars. IEEE Symposium on Security and Privacy (S&P 2009). → Founding work on structural probabilistic methods (Markov competitor) Wheeler, D.L. (2016). zxcvbn: Low-Budget Password Strength Estimate. 25th USENIX Security Symposium. → Confirms the effectiveness of n-grams for realistic password force estimation Pasquini, D., Cianfriglia, M., Ateniese, G., & Bernaschi, M. (2021). Reducing Bias in Modeling Real-World Password Strength via Deep Learning and Dynamic Dictionaries. 30th USENIX Security Symposium. → Quantitative comparison Markov vs deep learning on modern corpus Hitaj, B., Gasti, P., Ateniese, G., & Perez-Cruz, F. (2019). PassGAN: A Deep Learning Approach for Password Guessing. ACNS 2019. → Markov Neural Successor, comparison point for Time2Crack descNeural

    Open source implementations

    OMEN https://github.com/RUB-SysSec/OMEN John TheRipper (Markov mode) https://www.openwall.com/john/doc/MARKOV.shtml PACK (statsgen/maskgen) https://github.com/iphelix/pack Hashcat https://hashcat.net/hashcat/

    Sources of corpus

    RockYou (2009) : 14,3M passwords from the RockYou social network flaw Have I Been Pwned https://haveibeenpwned.com — ~14 billion aggregate credentials (Troy Hunt) SecLists : https://github.com/danielmiessler/SecLists — reference dictionaries and wordlists

    Web sources cited in the Time2Crack application

    Springer (chapter Markov/probabilistic cracking). https://link.springer.com/chapter/10.1007/978-3-319-24174-67 → Related source in descMarkov (app.js) for the effective reduction of the research space by statistical prioritization.
    Document generated for Time2Crack project — last update : 2026-04-01 Based on implementation in app.js (Markov v2 rank-based) and academic publications cited