Skip to content

Update PDF extraction and OCR options for hybrid chunking #557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 42 commits into from
Apr 9, 2025

Conversation

aakankshaduggal
Copy link
Member

@aakankshaduggal aakankshaduggal commented Feb 12, 2025

Addresses #503 , #436

@mergify mergify bot added the ci-failure label Feb 12, 2025
Signed-off-by: Aakanksha Duggal <aduggal@redhat.com>
@mergify mergify bot added ci-failure dependencies Pull requests that update a dependency file and removed ci-failure labels Feb 12, 2025
Signed-off-by: Aakanksha Duggal <aduggal@redhat.com>
Signed-off-by: Aakanksha Duggal <aduggal@redhat.com>
@mergify mergify bot added ci-failure and removed ci-failure labels Feb 12, 2025
Signed-off-by: Aakanksha Duggal <aduggal@redhat.com>
Signed-off-by: Aakanksha Duggal <aduggal@redhat.com>
…p in test.yml

Signed-off-by: eshwarprasadS <eshwarprasad.s01@gmail.com>
@mergify mergify bot added ci-failure and removed ci-failure labels Apr 2, 2025
Signed-off-by: eshwarprasadS <eshwarprasad.s01@gmail.com>
@mergify mergify bot added the ci-failure label Apr 8, 2025
Signed-off-by: Aakanksha Duggal <aduggal@redhat.com>
@mergify mergify bot added one-approval and removed ci-failure labels Apr 8, 2025
Copy link
Contributor

@bbrowning bbrowning left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been through a lot of iterations and looks good overall. I appreciate all the extra tests added and the removal of large portions of code that we no longer have to maintain around docling json parsing 🎉

@mergify mergify bot removed the one-approval label Apr 8, 2025
@mergify mergify bot merged commit 79c6047 into instructlab:main Apr 9, 2025
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CI/CD Affects CI/CD configuration dependencies Pull requests that update a dependency file testing Relates to testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants