Recently I downloaded a large number of PDFs of an old magazine that I wanted to read. I’m particular about my file names, so I used A Better Finder Rename to fix them so they all looked like this:
- 1934 0104 Old Magazine.pdf
- 1934 0111 Old Magazine.pdf
- 1934 0118 Old Magazine.pdf
- 1934 0125 Old Magazine.pdf
Things looked great, so I imported the PDFs into Calibre, free & open source software that is basically an iTunes for ebooks. Unfortunately, the Titles listed in Calibre looked like this:
- Old Magazine, Jan 04, 1934
- Old Magazine, January 11, 1934
- Old Magazine January 18, 1934
- 1934 0125 Old Magazine
Out of the 48 or so PDFs I imported, one about five imported with the correct Titles. I assumed that the file names I gave the PDFs would be the same as their Titles in Calibre. But clearly they weren’t. What was going on?
I started by opening a PDF in Preview & selecting Tools > Show Inspector > General Info. I immediately noticed the Title field, which said
Old Magazine November 10, 1934. Aha! I needed to change the Title metadata in the PDF1. But not in just one PDF—in about 200 PDFs, and programmatically, from the command line. Why? Because I sure as heck wasn’t gonna do it one at a time2.
After looking at several different methods, most of which made you change the Title one file at a time, which was useless. Finally, I discovered PDFtk, short for “The PDF Toolkit”. This is a cross-platform command line app that describes itself as able to do all of the following:
- Merge PDF Documents or Collate PDF Page Scans
- Split PDF Pages into a New Document
- Rotate PDF Documents or Pages
- Decrypt Input as Necessary (Password Required)
- Encrypt Output as Desired
- Fill PDF Forms with X/FDF Data and/or Flatten Forms
- Generate FDF Data Stencils from PDF Forms
- Apply a Background Watermark or a Foreground Stamp
- Report PDF Metrics, Bookmarks and Metadata
- Add/Update PDF Bookmarks or Metadata
- Attach Files to PDF Pages or the PDF Document
- Unpack PDF Attachments
- Burst a PDF Document into Single Pages
- Uncompress and Re-Compress Page Streams
- Repair Corrupted PDF (Where Possible)
Start by downloading PDFtk for your Mac from it’s website. It’s a PKG file, so just click on it, answer Continue, Continue, Install, & you now have PDFtk installed at
Now let’s get that metadata out of an existing PDF so we can see it.
Notice these two lines:
That’s what I need to change. Here’s the rub: PDFtk expects you to do this for each PDF whose metadata you want to change:
Let me break down the components of this command:
The command. Duh.
The PDF with the incorrect metadata.
The task I want PDFtk to perform: updating, or changing, metadata for the specified PDF.
The file that contains the changes, with one line specifying the metadata key to change, & one line for the metadata value for that key.
I’m letting PDFtk know that I want to write these changes to a new file.
Our brand spanking new PDf to create, with corrected metadata.
The problem was that you have to specify each title in
instructions.txt every time. That’s a pain, but I wrote a little shell script that makes the whole process very easy. I hope all of the comments make it clear what everything is doing3.
I saved the script in
pdfname2title.sh, & then made it executable:
Using it is pretty simple. Just
cd to the directory with the PDFs, & then run the script:
A few moments later & the script is finished. When I imported the PDFs this time, the Titles matched the file names. I’m glad I discovered PDFtk, & I’m also glad I wrote up that shell script. I have the feeling it will come in handy more than I might think.
The Title, not the file name. I wanted them to be similar, but they don’t have to be, at all. ↩
Those of you who know Calibre well—and really, while it’s not the most Mac-like software out there, it does an awesome job managing & converting ebooks—know that there is a setting I could have used in Calibre to solve my problem. If you go to Preferences > Adding Books > The Add Process, you’ll see a setting there that is checked by default: Read Metadata From File Contents Rather Than File Name. Uncheck that, & when you do your import, yes, the file name appears as the Title. Huzzah! Except that none of the other metadata is used either, which means that any covers don’t import, which sucks when you have really cool old covers. So unchecking that setting Calibre really wasn’t an option. ↩