Friday, December 18, 2009

Batch converting HTML files to PDF using OS X and 'convert' utility

At work we are converting content to a new website. As a part of that conversion, some older content will be archived on the new site in the form of PDF documents.

I needed something to convert HTML documents to PDF. We own Adobe's software and there is an option to convert entire web URLs to a single PDF, but that's not what we needed.
I could convert single URLs using the same Adobe software, but that wasn't an optimum solution either. My boss, through a Google search had found a utility on the Mac called 'convert' which does this.

I automated this through a Bash script call, but we still had problems because the pages were truncated. I went to the directory where the "convert" application was, and found it links to 'cupsfilter' in /usr/bin.

By figuring out what cupsfilter does, I was able to determine the parameters necessary to make the PDFs landscape, and use the page size of A4, which was enough to have it work properly.

The great thing about this is if we would have used the "Save As" feature to save each page to a PDF it would have taken hundreds of hours. Using a Bash shell it took an hour to convert three directories of HTML files with about 150+ files per directory. Even though I used 'convert' I suspect you could do the same thing by using the cupsfilter directly on any UNIX variant.

The key parameter was "landscape" but when using "convert" it wasn't obvious how to specify the parameters correctly. Through cupsfilter man pages I found out what I needed; in cupsfilter it's with a "-o" option, but in "convert" it's using -a. For media format options were "Letter" "Legal" and "A4" but A4 worked best. Letter was a little too small and ended up truncating some of our documents.

Here's my Bash Shell Command that walked through the current
directory finding HTML files with the extension HTM, and for
the output file name used SED (Stream EDitor) to convert the
HTM in the filename to the output file type of PDF.

for name in `ls *.htm` ; do /System/Library/Printers/Libraries/convert -f $name -o `echo $name | sed s/htm/pdf/` -a landscape -a scaling=75 -a media=A4; done

I have to point out that if I was still using a PC I could have probably done this with CygWin but the articles we found on Google indicate the people that did this used convert and I don't know I would have figured out to use cupsfilter instead which is what Mac OS X linked to.


Paul Hankin said...

${a/htm/pdf} is better that the backquote section using sed.

Paul Hankin said...

I mean of course ${name/htm/pdf}

DenverJuggler said...

Awesome - thanks Paul.

I know I don't always do everything the most efficient way in UN*X but it's great there's always multiple working solutions.

Should I also add the period (need to be escaped?) reflecting the separator between the file name and type and a dollar sign to reflect the end of the string so it only matches on the file type?

オテモヤン said...


generic propecia said...

Great site,this information really helped me , I really appreciate it.Thanks a lot for a bunch of good tips. I look forward to reading more on the topic in the future. Keep up the good work! This blog is going to be great resource. Love reading it.
nice tip

DenverJuggler said...

I did an Ignite! talk on this at our Denver Java Users Group. Here's the YouTube if you're interested:

polocanada said...

Great. I didn't know Adobe Acrobat can actually copy the whole site offline as PDF booklet (like Sitesucker plus added benefit of one PDF).
That's fantastic and useful. Thank's for pointing this out.

Unknown said...

I couldn't make the scaling=75 parameter to work whatever value I use the output is always the same ? have I missed something ?

Unknown said...

I couldn't make the scaling=75 parameter to work whatever value I use the output is always the same ? have I missed something ?

Anonymous said...

I could not make the scaling=[int] parameter work either.

Landscape seems to work, but not scaling.

Anonymous said...

Batch converting HTML files to PDF is not so easy task now.
Check out

jason said...

Welcome to suplexmedshop Online Drugstore, BUY Xanax Online without prescription.
Looking to buy prescription drugs online discreetly?
You are at the right place! Here, you can buy drugs online without prescription and have it shipped and delivered to your location.
As part of our expansion, we equally ship to all 50 States within the USA, The UK, Canada, Australia and a whole lot of countries worldwide.

Buying prescription drugs like Actavis Promethazine – Codeine Cough Syrup has never been easier.
We equally employ the most discreet payment methods, Bitcoin being one of many. All to ensure your safety and discretion is guaranteed.
Buy Drugs Online Without Prescription .Actavis
Promethazine Cough Syrup with codeine
and Morphine Sulphate for Sale Online Without Prescription.
Actavis Promethazine.
How should I take Adderall?,
BUY Xanax Online
Buy Adderall Online
BUY Promethazine Codeine
Oxycodone for sale

John Stephen said...


We are an MRO parts supplier with a very large inventory. We ship parts to all the countries in the world, usually by DHL AIR. You are suggested to make payments online. And we will send you the tracking number once the order is shipped.