-
-
Notifications
You must be signed in to change notification settings - Fork 547
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(markdown): improve markdown parser and diagnostics #5292
base: main
Are you sure you want to change the base?
Conversation
…d fenced code blocks
…robust block handling
…markdown, headers, and lists
…ens and kinds - Rename `DemoSyntaxTreeBuilder` to `MarkdownSyntaxTreeBuilder` in markdown factory - Extend `MarkdownSyntaxKind` with new tokens for tables, lists, blockquotes, and delimiters - Add utility functions in `make.rs` to create various Markdown syntax tokens - Update macro rules to include new token types
- Introduce `markdown` module in configuration crate - Add `MarkdownConfiguration` struct with formatter, linter, and parser options - Extend `Configuration` struct to include optional Markdown configuration - Implement default configuration and configuration builder for Markdown
- Update `LintCommandPayload` to include Markdown linter configuration - Modify CLI session to handle Markdown linter in command processing - Add Markdown linter configuration import and merge logic in command runner
…kdown block types - Enhance parsing logic for blockquotes, headers, list items, and thematic breaks - Add more robust whitespace and marker handling - Improve continuation line parsing for list items - Implement more precise checkpoint and rewind mechanisms - Add better error recovery and validation for different block types
CodSpeed Performance ReportMerging #5292 will not alter performanceComparing Summary
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for contribution! I haven't reviewed the logic of the parser because there are other important things to change. Briefly:
- revert snapshots that caused regressions
- remove files that cause exptions
- do not expose any configuration to our users yet
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am a bit conflicted about accepting the parsing of headers because there was another contributor who was working on it before you: #5208
I wish you could have engaged with us before jumping straight to developments. Let's see how it goes, since this PR needs some reviews.
// For now, update snapshots automatically | ||
insta::with_settings!({ | ||
prepend_module_to_snapshot => false, | ||
snapshot_path => &test_directory, | ||
// Auto-accept new snapshots during development | ||
input_file => test_case_path.to_string_lossy().to_string(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please, let's avoid this. It's easier to accept a snapshot with another CLI command than figuring out why a snapshot was updated, because we might forget to update this code.
crates/biome_markdown_parser/tests/md_test_suite/ok/thematic_break_block.md.snap
Outdated
Show resolved
Hide resolved
crates/biome_markdown_parser/tests/md_test_suite/ok/thematic_break_block.md.snap
Outdated
Show resolved
Hide resolved
crates/biome_markdown_parser/tests/md_test_suite/err/invalid_header.md.snap
Outdated
Show resolved
Hide resolved
…figuration imports - Delete markdown parser test files including `parser_test.rs` and `spec_test.rs` - Remove unused Markdown configuration import from CLI commands module - Update spec test to use more robust debugging and tree structure printing - Remove test files for invalid markdown headers
…t validation - Add direct support for thematic break literal tokens - Enhance thematic break block parsing to handle whitespace and marker variations - Implement stricter validation to prevent invalid thematic break content - Add comments to clarify thematic break detection logic
…ine handling - Refactor blockquote parsing to handle nested blockquotes and continuation lines - Improve line parsing logic to support more complex blockquote structures - Simplify content node creation and add more robust parsing mechanism - Enhance end-of-blockquote detection with better line and EOF handling
…ed logic - Simplify header block parsing by removing redundant diagnostic creation - Enhance whitespace handling with a more concise consumption method - Remove explicit text node creation for header content - Improve header validation and parsing robustness
…rkers and thematic breaks - Enhance lexer to handle list markers for stars, minus, digits, and underscores - Implement more robust thematic break detection with flexible marker parsing - Separate parsing logic for different marker types to improve code clarity - Add support for various list marker formats and whitespace handling
- Simplify document parsing by removing unnecessary checks and improving block handling - Introduce a checkpoint mechanism for better fallback to paragraph parsing - Streamline block element recognition and improve error handling for unrecognized tokens - Add support for '+' as a list marker and enhance overall lexer functionality - Update tests to allow for development diagnostics and improve tree structure printing
… function names and logic - Rename functions for clarity: `at_header_block` to `at_atx_header` and `parse_header_block` to `parse_atx_header` - Improve header validation by ensuring whitespace follows hash symbols - Streamline content parsing logic for ATX headers, including handling of trailing hashes - Add a new function `at_setext_header` for future Setext header support
…alidation and diagnostics - Introduce detailed logging for whitespace and marker checks in unordered and ordered list parsing - Refactor list item parsing functions to include parent indentation tracking - Improve whitespace handling after list markers and ensure valid content parsing - Streamline continuation line handling for list items and enhance overall structure
- Consolidate paragraph parsing by removing redundant checks and simplifying inline element handling - Introduce a checkpoint mechanism to improve content validation and error handling - Enhance whitespace management and ensure proper handling of leading spaces and newlines - Create a more efficient structure for paragraph item list creation and completion
…aders, lists, and thematic breaks
Summary
I've made some needed improvements to the Markdown parser's structure and error reporting. The main issues I tackled were:
Before these changes, the parser couldn't properly report when someone wrote "#Header" instead of "# Header". Now it catches these errors with the right position info and a clear message. The new structure should also make it easier for others to work on the parser going forward.
Test Plan
The implementation correctness is demonstrated by:
✅ Tests passing
✅ Invalid header reporting diagnostics
✅ Some manual testing
To verify:
cargo test -p biome_markdown_parser
to confirm all tests passinvalid_header.md
test properly emits the diagnostic "Invalid header format: missing space after '#'"