feat(markdown): improve markdown parser and diagnostics #5292

afonsojramos · 2025-03-07T00:50:23Z

Summary

I've made some needed improvements to the Markdown parser's structure and error reporting. The main issues I tackled were:

Reorganized the parser code with separate modules for each block type
Fixed diagnostic messages for headers without spaces after hash symbols
Made error handling more consistent
Added a README with usage examples

Before these changes, the parser couldn't properly report when someone wrote "#Header" instead of "# Header". Now it catches these errors with the right position info and a clear message. The new structure should also make it easier for others to work on the parser going forward.

Test Plan

The implementation correctness is demonstrated by:

✅ Tests passing
✅ Invalid header reporting diagnostics
✅ Some manual testing

To verify:

Run cargo test -p biome_markdown_parser to confirm all tests pass
Check that the invalid_header.md test properly emits the diagnostic "Invalid header format: missing space after '#'"
Try parsing markdown with invalid headers and verify the error message and location are correct

…d fenced code blocks

…robust block handling

…stness

…markdown, headers, and lists

… handling

…ens and kinds - Rename `DemoSyntaxTreeBuilder` to `MarkdownSyntaxTreeBuilder` in markdown factory - Extend `MarkdownSyntaxKind` with new tokens for tables, lists, blockquotes, and delimiters - Add utility functions in `make.rs` to create various Markdown syntax tokens - Update macro rules to include new token types

- Introduce `markdown` module in configuration crate - Add `MarkdownConfiguration` struct with formatter, linter, and parser options - Extend `Configuration` struct to include optional Markdown configuration - Implement default configuration and configuration builder for Markdown

- Update `LintCommandPayload` to include Markdown linter configuration - Modify CLI session to handle Markdown linter in command processing - Add Markdown linter configuration import and merge logic in command runner

…kdown block types - Enhance parsing logic for blockquotes, headers, list items, and thematic breaks - Add more robust whitespace and marker handling - Improve continuation line parsing for list items - Implement more precise checkpoint and rewind mechanisms - Add better error recovery and validation for different block types

codspeed-hq · 2025-03-07T02:16:42Z

CodSpeed Performance Report

Merging #5292 will not alter performance

_{Comparing afonsojramos:feat/markdown-parser (0222689) with main (08de81d)}

Summary

✅ 95 untouched benchmarks

ematipico

Thank you for contribution! I haven't reviewed the logic of the parser because there are other important things to change. Briefly:

revert snapshots that caused regressions
remove files that cause exptions
do not expose any configuration to our users yet

ematipico · 2025-03-07T07:49:32Z

crates/biome_markdown_parser/src/syntax/header_block.rs

I am a bit conflicted about accepting the parsing of headers because there was another contributor who was working on it before you: #5208

I wish you could have engaged with us before jumping straight to developments. Let's see how it goes, since this PR needs some reviews.

ematipico · 2025-03-07T07:51:34Z

crates/biome_markdown_parser/tests/spec_test.rs

+    // For now, update snapshots automatically
    insta::with_settings!({
        prepend_module_to_snapshot => false,
        snapshot_path => &test_directory,
+        // Auto-accept new snapshots during development
+        input_file => test_case_path.to_string_lossy().to_string(),


Please, let's avoid this. It's easier to accept a snapshot with another CLI command than figuring out why a snapshot was updated, because we might forget to update this code.

crates/biome_markdown_parser/tests/spec_test.rs

crates/biome_markdown_syntax/src/generated/kind.rs

crates/biome_markdown_parser/tests/md_test_suite/ok/thematic_break_block.md.snap

crates/biome_markdown_parser/tests/md_test_suite/err/invalid_header.md.snap

crates/biome_configuration/src/lib.rs

…figuration imports - Delete markdown parser test files including `parser_test.rs` and `spec_test.rs` - Remove unused Markdown configuration import from CLI commands module - Update spec test to use more robust debugging and tree structure printing - Remove test files for invalid markdown headers

…t validation - Add direct support for thematic break literal tokens - Enhance thematic break block parsing to handle whitespace and marker variations - Implement stricter validation to prevent invalid thematic break content - Add comments to clarify thematic break detection logic

…ine handling - Refactor blockquote parsing to handle nested blockquotes and continuation lines - Improve line parsing logic to support more complex blockquote structures - Simplify content node creation and add more robust parsing mechanism - Enhance end-of-blockquote detection with better line and EOF handling

…ed logic - Simplify header block parsing by removing redundant diagnostic creation - Enhance whitespace handling with a more concise consumption method - Remove explicit text node creation for header content - Improve header validation and parsing robustness

…rkers and thematic breaks - Enhance lexer to handle list markers for stars, minus, digits, and underscores - Implement more robust thematic break detection with flexible marker parsing - Separate parsing logic for different marker types to improve code clarity - Add support for various list marker formats and whitespace handling

- Simplify document parsing by removing unnecessary checks and improving block handling - Introduce a checkpoint mechanism for better fallback to paragraph parsing - Streamline block element recognition and improve error handling for unrecognized tokens - Add support for '+' as a list marker and enhance overall lexer functionality - Update tests to allow for development diagnostics and improve tree structure printing

… function names and logic - Rename functions for clarity: `at_header_block` to `at_atx_header` and `parse_header_block` to `parse_atx_header` - Improve header validation by ensuring whitespace follows hash symbols - Streamline content parsing logic for ATX headers, including handling of trailing hashes - Add a new function `at_setext_header` for future Setext header support

…alidation and diagnostics - Introduce detailed logging for whitespace and marker checks in unordered and ordered list parsing - Refactor list item parsing functions to include parent indentation tracking - Improve whitespace handling after list markers and ensure valid content parsing - Streamline continuation line handling for list items and enhance overall structure

- Consolidate paragraph parsing by removing redundant checks and simplifying inline element handling - Introduce a checkpoint mechanism to improve content validation and error handling - Enhance whitespace management and ensure proper handling of leading spaces and newlines - Create a more efficient structure for paragraph item list creation and completion

…aders, lists, and thematic breaks

afonsojramos added 17 commits March 7, 2025 00:39

feat(markdown_parser): add blockquote block parsing support

b712073

feat(markdown_parser): add code block parsing support for indented an…

6cfda97

…d fenced code blocks

feat(markdown_parser): add header block parsing support

3e88015

feat(markdown_parser): add HTML block parsing support

421f9e8

feat(markdown_parser): add list block parsing support

339ba33

feat(markdown_parser): add paragraph block parsing support

184aeec

feat(markdown_parser): add table block parsing support

0ef00ca

feat(markdown_parser): improve thematic break block parsing

d5cd19b

feat(markdown_parser): add document parsing method

f0758f3

feat(markdown_parser): implement comprehensive document parsing with …

1d511ad

…robust block handling

feat(markdown_parser): add header validation and improve parsing robu…

5c03604

…stness

test(markdown_parser): add test case for invalid markdown headers

7b4fbc6

test(markdown_parser): add test cases for blockquotes, comprehensive …

7f822f2

…markdown, headers, and lists

test(markdown_parser): refactor spec tests and add more flexible test…

ba4f600

… handling

feat(cli): add Markdown linter configuration support

4a0fdf6

- Update `LintCommandPayload` to include Markdown linter configuration - Modify CLI session to handle Markdown linter in command processing - Add Markdown linter configuration import and merge logic in command runner

github-actions bot added A-CLI Area: CLI A-Project Area: project A-Parser Area: parser labels Mar 7, 2025

afonsojramos added 3 commits March 7, 2025 01:38

Merge remote-tracking branch 'biome/main' into feat/markdown-parser

9b3bc39

chore: run cargo fmt

8991695

ematipico requested changes Mar 7, 2025

View reviewed changes

dyc3 self-requested a review March 7, 2025 19:03

afonsojramos added 2 commits March 10, 2025 09:37

chore: remove Markdown configuration

833a088

github-actions bot removed the A-Project Area: project label Mar 12, 2025

afonsojramos added 9 commits March 12, 2025 12:20

test(markdown_parser): add comprehensive tests for list marker parsing

c9743c9

refactor(markdown_parser): update snapshot files for blockquotes, he…

0505548

…aders, lists, and thematic breaks

github-actions bot added the A-Tooling Area: internal tools label Mar 15, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(markdown): improve markdown parser and diagnostics #5292

feat(markdown): improve markdown parser and diagnostics #5292

afonsojramos commented Mar 7, 2025

codspeed-hq bot commented Mar 7, 2025 •

edited

Loading

ematipico left a comment

ematipico Mar 7, 2025

ematipico Mar 7, 2025

feat(markdown): improve markdown parser and diagnostics #5292

Are you sure you want to change the base?

feat(markdown): improve markdown parser and diagnostics #5292

Conversation

afonsojramos commented Mar 7, 2025

Summary

Test Plan

codspeed-hq bot commented Mar 7, 2025 • edited Loading

CodSpeed Performance Report

Merging #5292 will not alter performance

Summary

ematipico left a comment

Choose a reason for hiding this comment

ematipico Mar 7, 2025

Choose a reason for hiding this comment

ematipico Mar 7, 2025

Choose a reason for hiding this comment

codspeed-hq bot commented Mar 7, 2025 •

edited

Loading