Wardline 01 12 language evaluation criteria

12. Language evaluation criteria¶

The wardline classification framework is language-agnostic; language-specific enforcement regimes (§15.4) implement its requirements using language-native mechanisms. Not all languages provide equal support across the three enforcement layers. The following rubric assesses how well a given language ecosystem supports wardline enforcement across the three enforcement layers and the conformance profiles defined in §15.3.

Criterion	What to Assess
Annotation expressiveness	Can the language express all 17 annotation groups at function, class, and field level without runtime overhead?
Parse tree access	Does the language provide AST or equivalent for static analysis? Is the parse tree stable across versions?
Type system metadata	Can type annotations carry tier/trust metadata? Does the type checker propagate this metadata through assignments, calls, and returns?
Structural typing	Can the type system distinguish raw, guarded, and assured data with identical field structures?
Runtime object model	Can the language prevent invalid access patterns structurally (raising on access, not defaulting)?
Class hierarchy enforcement	Can base classes constrain what subclasses may do — preventing unannotated method addition?
Serialisation boundary control	Can the language detect or prevent tier violations at serialisation/deserialisation boundaries?
Error model	How does the language represent failure (`throw`, `Result`, error returns, sentinels), and how does that shape detection of WL-003, WL-004, and WL-005 equivalents?
Concurrency model	What concurrency primitives exist (threads, async tasks, channels, actors, shared-memory locks), and how do they affect Group 13 detection and ordering guarantees?
Tooling ecosystem	Does the language have mature static analysis infrastructure (custom lint rules, AST analysis frameworks)?
Existing tool coverage	Can existing tools in this ecosystem implement wardline conformance profiles (§15.3) without requiring a bespoke product? Which profiles are achievable through plugins or extensions to existing tools, and which require new tooling?

Language evaluation criteria for wardline binding suitability

Languages with stronger type systems (Rust, Haskell) may provide better type-system-layer coverage while requiring less runtime enforcement. Languages with rich object models but weaker types (Python, Ruby) may rely more heavily on runtime structural enforcement. Languages with minimal runtime introspection (C, Go) place greater burden on static analysis. The criteria identify where each language's enforcement regime will be strong and where it will have structural gaps requiring compensating controls. The "existing tool coverage" criterion is particularly important for adoption: a language where an existing type checker can implement Wardline-Type and an existing linter can implement Wardline-Core has a lower adoption barrier than a language requiring entirely new tooling for every profile.

Worked assessment sketch (Python). A Python binding typically scores high on annotation expressiveness (decorators, type annotations, schema metadata), high on parse-tree access (ast and mature linting infrastructure), medium on type-system metadata (Annotated[T, ...] and plugin-capable type checkers), low-to-medium on structural typing for tier distinction (the language does not natively make tier mismatches unrepresentable), high on runtime object model (descriptors, metaclasses, __getattribute__, __init_subclass__), high on error-model expressiveness (exceptions provide rich detection surfaces for WL-003 to WL-005), and medium on concurrency analysis (threads, asyncio, processes, and callback-heavy frameworks complicate Group 13 reasoning). In profile terms, Python can usually achieve Wardline-Core with existing static-analysis tooling plus custom rules, can partially achieve Wardline-Type through checker plugins, and often relies on runtime structural mechanisms to close gaps that stronger static type systems would handle earlier.

Advisory-to-structural spectrum. Language bindings exist on a spectrum from advisory to structural. At the advisory end (e.g., Python decorators), annotations are metadata that enforcement tools read but the language itself does not enforce — a decorator marks a function as @integral_read, but nothing in the language prevents the function from violating Tier 1 constraints. At the structural end (e.g., Rust phantom types encoding tier as a zero-sized type parameter), annotations are type constraints that make non-compliant code unrepresentable — tier mismatches are compile errors, not lint findings. Stronger bindings reduce generation risk: an agent coding against a structural binding receives tighter feedback and produces fewer violations, because the language itself rejects non-compliant code before any wardline tool runs. However, stronger bindings do not reduce governance risk: the type definitions that encode tier semantics still need human ratification, periodic review, and change authority (§10.3.1). A Rust binding where tier assignments are wrong at the type level produces code that is structurally compliant with the wrong policy — the same manifest poisoning risk (§10.3.2) as any other binding, expressed through the type system rather than through decorator metadata. The evaluation criteria in this section help identify where each language sits on this spectrum and where governance controls must compensate for the binding's enforcement limitations.

Some pattern rules may be structurally inapplicable in certain languages. In statically typed languages, WL-002 (existence-checking as a structural gate) and WL-006 (runtime type-checking) may be partially or wholly addressed by the type system itself — the patterns they detect cannot occur or are caught at compile time. The evaluation criteria are the mechanism for identifying these per-language structural gaps: where a language's type system already prevents a class of violation, the corresponding pattern rule is marked N/A in that binding's severity matrix, with documented rationale.

Language evaluation is a living binding artefact, not a one-time classification. Bindings SHOULD version their evaluation alongside the language and tooling versions they target, because parse-tree stability, type-system metadata propagation, concurrency primitives, and available enforcement tooling can change materially across runtime and compiler releases.