How to batch change the Title metadata in a bunch of PDFs with a shell script

Recently I downloaded a large number of PDFs of an old magazine that I wanted to read. I’m particular about my file names, so I used A Better Finder Rename to fix them so they all looked like this:

1934 0104 Old Magazine.pdf
1934 0111 Old Magazine.pdf
1934 0118 Old Magazine.pdf
1934 0125 Old Magazine.pdf

Things looked great, so I imported the PDFs into Calibre, free & open source software that is basically an iTunes for ebooks. Unfortunately, the Titles listed in Calibre looked like this:

Old Magazine, Jan 04, 1934
Old Magazine, January 11, 1934
Old Magazine January 18, 1934
1934 0125 Old Magazine

Out of the 48 or so PDFs I imported, one about five imported with the correct Titles. I assumed that the file names I gave the PDFs would be the same as their Titles in Calibre. But clearly they weren’t. What was going on?

I started by opening a PDF in Preview & selecting Tools > Show Inspector > General Info. I immediately noticed the Title field, which said Old Magazine November 10, 1934. Aha! I needed to change the Title metadata in the PDF¹. But not in just one PDF—in about 200 PDFs, and programmatically, from the command line. Why? Because I sure as heck wasn’t gonna do it one at a time².

After looking at several different methods, most of which made you change the Title one file at a time, which was useless. Finally, I discovered PDFtk, short for “The PDF Toolkit”. This is a cross-platform command line app that describes itself as able to do all of the following:

Merge PDF Documents or Collate PDF Page Scans
Split PDF Pages into a New Document
Rotate PDF Documents or Pages
Decrypt Input as Necessary (Password Required)
Encrypt Output as Desired
Fill PDF Forms with X/FDF Data and/or Flatten Forms
Generate FDF Data Stencils from PDF Forms
Apply a Background Watermark or a Foreground Stamp
Report PDF Metrics, Bookmarks and Metadata
Add/Update PDF Bookmarks or Metadata
Attach Files to PDF Pages or the PDF Document
Unpack PDF Attachments
Burst a PDF Document into Single Pages
Uncompress and Re-Compress Page Streams
Repair Corrupted PDF (Where Possible)

Perfect!

Start by downloading PDFtk for your Mac from it’s website. It’s a PKG file, so just click on it, answer Continue, Continue, Install, & you now have PDFtk installed at /opt/pdflabs/pdftk.

Now let’s get that metadata out of an existing PDF so we can see it.

$ cd TNY
$ pdftk 1934\ 0104\ Old\ Magazine.pdf dump_data output report.txt
$ cat report.txt
InfoBegin
InfoKey: Creator
InfoValue: PScript5.dll Version 5.2.2
InfoBegin
InfoKey: Title
InfoValue: Old Magazine, Jan 04, 1934
InfoBegin
InfoKey: Producer
InfoValue: Acrobat Distiller 8.1.0 (Windows)
InfoBegin
InfoKey: Author
InfoValue: kao
InfoBegin
InfoKey: ModDate
InfoValue: D:20091228164142-05'00'
InfoBegin
InfoKey: CreationDate
InfoValue: D:20091228164142-05'00'
PdfID0: 4ffa99d7de15c0856ed0d9d23c8bab80
PdfID1: 717e840b8d76c044a86a7ce38d9c0658
NumberOfPages: 84
PageLabelBegin
PageLabelNewIndex: 1
PageLabelStart: 1
PageLabelNumStyle: DecimalArabicNumerals

Notice these two lines:

InfoKey: Title
InfoValue: Old Magazine, Jan 04, 1934

That’s what I need to change. Here’s the rub: PDFtk expects you to do this for each PDF whose metadata you want to change:

$ pdftk old-file-name.pdf update_info instructions.txt output new-file-name.pdf

Let me break down the components of this command:

pdftk
The command. Duh.
old-file-name.pdf
The PDF with the incorrect metadata.
update_info
The task I want PDFtk to perform: updating, or changing, metadata for the specified PDF.
instructions.txt
The file that contains the changes, with one line specifying the metadata key to change, & one line for the metadata value for that key.
output
I’m letting PDFtk know that I want to write these changes to a new file.
new-file-name.pdf
Our brand spanking new PDf to create, with corrected metadata.

The problem was that you have to specify each title in instructions.txt every time. That’s a pain, but I wrote a little shell script that makes the whole process very easy. I hope all of the comments make it clear what everything is doing³.

#!/bin/bash

# Set IFS since input has spaces, but first save old IFS
# See http://www.cyberciti.biz/tips/handling-filenames-with-spaces-in-bash.html
SAVEIFS=$IFS
IFS=$(echo -en "\n\b")

for i in $(ls *.pdf)
do
  # Rename the file so I can output to the original file name
  mv "$i" _"$i"
  # Save the file name
  filename=$(echo $i)
  # Write an instructions file for pdftk
  echo "InfoKey: Title" > /tmp/pdftemp.txt
  echo "InfoValue: $filename" >> /tmp/pdftemp.txt
  # Change the PDF’s metadata, outputting to the original file name
  pdftk _"$i" update_info /tmp/pdftemp.txt output $i
  # Get rid of the renamed file
  rm _"$i"
done

# Restore IFS
IFS=$SAVEIFS

I saved the script in ~/bin as pdfname2title.sh, & then made it executable:

$ chmod 755 pdfname2title.sh

Using it is pretty simple. Just cd to the directory with the PDFs, & then run the script:

$ pdfname2title.sh

A few moments later & the script is finished. When I imported the PDFs this time, the Titles matched the file names. I’m glad I discovered PDFtk, & I’m also glad I wrote up that shell script. I have the feeling it will come in handy more than I might think.

The Title, not the file name. I wanted them to be similar, but they don’t have to be, at all. ↩
Those of you who know Calibre well—and really, while it’s not the most Mac-like software out there, it does an awesome job managing & converting ebooks—know that there is a setting I could have used in Calibre to solve my problem. If you go to Preferences > Adding Books > The Add Process, you’ll see a setting there that is checked by default: Read Metadata From File Contents Rather Than File Name. Uncheck that, & when you do your import, yes, the file name appears as the Title. Huzzah! Except that none of the other metadata is used either, which means that any covers don’t import, which sucks when you have really cool old covers. So unchecking that setting Calibre really wasn’t an option. ↩
In the shell script following, you’ll see a reference to IFS. For more on the IFS, see the Wikipedia article on the subject. ↩