Strange behavior of getDocumentProxy
's buffer when extracting text AND rendering page as image (only for some pdf)
#17
Labels
bug
Something isn't working
Environment
node v20.11.1
unpdf v0.11.0
Reproduction
I got the original error in a server route of a Nuxt 3 project. Also, in the original app I performed other operations besides text/metadata extraction and image rendering.
Anyway, I prepared a new Nitro project for this issue and isolated only the error involved. You can find the repo here: https://github.com/ndrbrt/unpdf-issue
Describe the bug
First of all, I noticed the issue only for some pdfs (actually pdfs with images, but I don't know if it's something comparable to #4, nor if it only affects pdfs with images).
Error A
The original code was similar to that in
server/api/error-a.ts
.If you run the dev server and open, e.g.:
You get the following error:
However, as I said, if you pass some other pdfs, everything's fine, e.g.:
Working version
Now, the only way I was able to solve the problem is as in
server/api/working.ts
: I copied the original buffer before it was passed togetDocumentProxy
and then passed the copied buffer torenderPageAsImage
. You can see that both requests succeed:Error B
I also tried another approach in
server/api/error-b.ts
, passing anew Uint8Array(buffer)
directly torenderPageAsImage
. This way, if you open:You get this error:
Interestingly, in this case, if you repeat the request disabling text extraction (note the query param), it works:
Additional context
I did not use the official PDF.js build, because I couldn't get it to work. I still tried using the default build from unpdf and everything worked fine until I noticed the mentioned problem.
Logs
No response
The text was updated successfully, but these errors were encountered: