Back to Explore

adrienverge/yamllint

Python3,347307151 issues100 contributorsGPL-3.0
View on GitHub

Summary

yamllint is a Python command-line linter for YAML files that checks both syntax validity and style/cosmetic issues (indentation, line length, trailing spaces, key duplicates, quoting style, etc.). It operates by tokenizing YAML via PyYAML and running configurable rules against the token stream, supporting inline disable comments, .gitignore-style file ignore patterns, and extensible configuration via inheritance.

Great for

people interested in static analysis tooling, specifically building or extending rule-based linters that operate on structured text formats using token-stream traversal

Easy wins

  • +Add a new lint rule (e.g., for YAML 1.2 boolean normalization or tag formatting) — the rule interface is a single check() generator function with a well-defined CONF/DEFAULT/TYPE/ID pattern visible in yamllint/rules/indentation.py
  • +Pick an open issue from the 151 backlog; the test patterns in tests/rules/ are extremely consistent (RuleTestCase.check() with inline problem annotations), so adding a test case for a reported edge case is straightforward
  • +Add a CONTRIBUTING.md — the repo lacks one entirely but CI, test structure, and code style (ruff + flake8) are already configured in pyproject.toml
  • +Fix edge cases in the indentation rule — it has documented known pyyaml quirks (fake B_SEQ tokens, missing BlockSequenceStartToken) with comments like 'missing BSeqStart here' in tests that signal known rough spots

Red flags

  • !No CONTRIBUTING.md or contributor guidelines — 100 contributors have managed without it but it's a gap
  • !151 open issues with no triage labels — hard to tell what's in-scope or prioritized without digging through each
  • !Relies on pyyaml's token stream which has documented bugs (non-indented sequences produce different tokens) — any new rule touching block sequences needs to account for this quirk or silently produce wrong results
  • !commit_count and contributor_count in metadata both report '1' which appears to be a data artifact — the repo clearly has 100 contributors and years of history, so don't trust that field

Code quality

good

The indentation rule (yamllint/rules/indentation.py) is the most complex file and is genuinely non-trivial — it maintains an explicit parse stack to handle pyyaml's inconsistent token emission and is well-commented with worked examples in the docstring. Test coverage is thorough and unusually honest: tests/rules/test_indentation.py documents known pyyaml tokenization bugs inline (e.g., 'There seems to be a bug in pyyaml') with pinned expected outputs including the broken behavior. The quoted-strings test file shows exhaustive combinatorial coverage of quote-type × required combinations. No obvious security issues, no hardcoded secrets, no dead code patterns spotted.

What makes it unique

This is the de-facto standard YAML linter for Python/Ansible ecosystems — it's not competing with similar tools, it effectively owns the space. The closest alternative would be using a schema validator (like jsonschema on parsed YAML) but yamllint specifically targets style/cosmetic rules that schema validators don't cover. The integration with Ansible lint and widespread CI usage gives it real-world staying power.

Discussion

Sign in to join the discussion.

No comments yet. Be the first to share your thoughts.