How to convert DJVU to PDF with table of contents
Contents
This article is based on the answer by pyrocrasty on StackExchange, and kcroker's dpsprep. It tells the essential idea and tools to convert DJVU to PDF with TOC preserved. A python script is given at the end to ease your usage.
Dependencies
-
PDF tool
pdftk
: install bybrew install pdftk
-
DJVU library
DjVuLibre
(also delivering commandline toolddjvu
,djvused
): install bybrew install djvulibre
-
Python package
sexpdata
to parse bookmark files: install bypip install sexpdata
Procedures
step 1: convert the file text
First, use any tool to convert the DJVU file to a PDF (without bookmarks).
Suppose the files are called filename.djvu
and filename.pdf
.
step 2: extract DJVU outline
Next, output the DJVU outline data to a file, like this:
|
|
This is a file listing the DJVU documents bookmarks in a serialized tree format. In fact it's just a SEXPR, and can be easily parsed. The format is as follows:
|
|
For example:
|
|
step 3: convert DJVU outline to PDF metadata format
Now, we need to convert these bookmarks into the format required by PDF metadata. This file has format:
|
|
So our example would become:
|
|
Basically, you just need to write a script to walk the SEXPR tree, keeping track of the level, and output the name, page number and level of each entry it comes to, in the correct format.
step 4: extract PDF metadata and splice in converted bookmarks
Once you've got the converted list, output the PDF metadata from your converted PDF file:
|
|
Now, open the file and find the line that begins: NumberOfPages:
insert the converted bookmarks after this line. Save the new file as
pdfmetadata.in
step 5: create PDF with bookmarks
Now we can create a new PDF file incorporating this metadata:
|
|
The file out.pdf
should be a copy of your PDF with the bookmarks
imported from the DJVU file.
Python script
To use this script, create a script file (e.g., named djvu2pdftoc
),
and add executable permission by chmod +x djvu2pdftoc
. Then you are
allowed to use it as:
-
./djvu2pdftoc IN.djvu OUT.pdf
(with default quality 80), or -
./djvu2pdftoc --quality 100 IN.djvu OUT.pdf
(lossless conversion)
|
|
Author oracleyue
LastMod 2021-02-23