Extending

Beyond widgets, transforms, and renderer subclasses, the parser exposes deeper hooks for cases where those aren't enough.

Custom rules

A rule observes the tree being built and reports diagnostics. Subclass markast.rules.Rule and override the methods you care about:

from markast import Parser
from markast.rules import Diagnostic, Rule, Severity


class HeadingMustBeShort(Rule):
    """Flag headings whose plain text exceeds 60 characters."""

    name = "short-heading"

    def check_heading_children(self, children, level):
        text = "".join(c.get("value", "") for c in children
                       if c.get("type") == "text")
        if len(text) > 60:
            return [Diagnostic(
                code="X100",
                message=f"Heading too long ({len(text)} chars).",
                context=text[:40] + "…",
                severity=Severity.WARNING,
            )]
        return None


parser = Parser(rules=[HeadingMustBeShort])

Diagnostic codes

CodeTrigger
W001Image inside a heading
W002Block element where inline is required
W003Unknown widget name
W004Invalid widget prop value (wrong type / not in choices)
W005Required widget prop missing
W006Image inside a table cell
W007Raw HTML block found (informational)
W008Footnote reference without a matching definition
W009Widget nesting deeper than the configured limit

To replace the built-in rules entirely (e.g. for a strict mode), pass an empty default and add yours. To extend, include BuiltinRules:

from markast.rules.builtin import BuiltinRules
parser = Parser(rules=[BuiltinRules(), HeadingMustBeShort()])

Tweaking the parser config

from markast import Parser, ParserConfig

cfg = ParserConfig(
    features=("tables", "strikethrough", "footnotes"),  # no autolinks/tasklists
    diagnose_html_blocks=False,
    max_widget_depth=8,
)

parser = Parser(cfg)

Use cfg.evolve(...) to derive a new config:

strict = cfg.evolve(max_widget_depth=4)

Replacing the tokenizer

Rare, but possible. Parser lazily constructs a Tokenizer; if you need a different one (e.g. to inject extra markdown-it-py plugins), assign it before the first parse:

from markast import Parser
from markast.parser.tokenizer import Tokenizer
from mdit_py_plugins.deflist import deflist_plugin


class MyTokenizer(Tokenizer):
    def _build_markdown_it(self):
        md = super()._build_markdown_it()
        md.use(deflist_plugin)
        return md


parser = Parser()
parser._tokenizer = MyTokenizer(parser.config, parser.registry)

Pattern: per-tenant parser

A multi-tenant service may need different widget sets per tenant. Build a parser cache keyed on tenant id:

from functools import lru_cache
from markast import Parser
from markast.widgets import default_registry, WidgetRegistry


@lru_cache(maxsize=64)
def parser_for(tenant_id: str) -> Parser:
    registry = default_registry.clone()
    for cls in load_tenant_widgets(tenant_id):
        registry.register(cls)
    return Parser(registry=registry, transforms=["normalize", "slugify"])


def render(tenant_id, markdown):
    return parser_for(tenant_id).parse(markdown)

Each parser is independent — registry mutations on one don't affect others.

Pattern: strict authoring CI

Combine custom rules with doc.has_errors to fail a CI build on bad content:

from markast import Parser
from markast.rules import Diagnostic, Rule, Severity


class NoRawHtml(Rule):
    name = "no-raw-html"

    def check_html_block(self, value):
        return [Diagnostic(
            code="C001",
            message="Raw HTML is not allowed in this corpus.",
            severity=Severity.ERROR,
        )]


parser = Parser(rules=[NoRawHtml()])

doc = parser.parse(open("article.md").read())
if doc.has_errors:
    for w in doc.warnings:
        print(f"::error::[{w['code']}] {w['message']}")
    raise SystemExit(1)

Where the source lives

markast/
├── ast/         types, factories, walker, schema export
├── parser/      tokenizer + builder (block / inline / widget / props)
├── render/      markdown + html
├── widgets/     base / registry / built-ins
├── rules/       diagnostic system + built-in rules
├── transforms/  normalize / slugify / toc / linkify / typography
├── config.py
├── document.py
├── parser_api.py
└── cli.py

Every file starts with a module docstring. If you're stuck, that's the fastest way in.