Fiona bought me a few books for Christmas and one of them was more perfect than the others. This beauty.
It’s a pretty dense academic text on, as the title implies, digital photography and how a century of thinking about analogue photography has been challenged by the last decade or so. And unlike so much guff spouted by the photographic and artistic world about digital photography it seems, on first browse, to be relatively in tune with my interests and prejudices, while challenging them in useful ways.
But there’s a problem.
For some reason, the publisher, Routledge, has decided that the book should be set in a really tiny type, presumably because the words are really important and require extra effort to read. As someone who has a few mild dyslexic issues with comprehending large blocks of text, this is not optimal for a book that promises to have large blocks of text requiring careful comprehension. For example, on page 44:
As a product of the technological age, photography occupies a privileged place within the Cartesian representational schema, for it is an image the truthfulness of which is underwritten by the scientific procedure that created it. By situating visual representation within a framework of empirical knowledge supported by automatism on the one hand, and by logical and rational use of light, optics and chemistry on the other, photography has been framed as an offshoot of objectivity and empiricism.
Now, I understand that, but I find it hard to comprehend that sort of thing at whatever tiny point size Routledge dictate. So it’s in my interests to hack the book.
I need the the text in an electronic format that I can read on a computer screen. In short, I need an ebook.
As an academic text with limited audience the book sells for £27, which is reasonable given the authors need to get paid. But because the cost is born by the authors and not the printers and booksellers (trade discounts for academic books are notoriously small) the eBook is a similar price so buying both would cost £50. Which is kinda silly.
Because book publishers haven’t figured out the “download code with physical purchase” thing yet, and because when they do they have a tendency to fuck it up (a DRM-free eBook I bought last year had a copyright warning with my name encoded at the end of every chapter, like I was some kind of child) I need to go elsewhere to get my digital copy.
You might thing getting illegal downloads of copyrighted material would require some kind of “darknet” hackery, but in fact a simple Google search with usually do. Pop in the title of the thing you want and the format you require it in and put aside, ooh, 5 minutes to sift through the spam. The chances are you’ll find it. On Google.
Eventually I found a link to download a PDF. Now, a PDF is either as useless as the book or just a step towards a useful file. It all depends on what kind of PDF it is.
If the PDF is simply a bunch of scans or photographs of the book text then it’s of no use. But if it’s a “true” PDF generated from the text itself then we’re getting somewhere. You can tell by trying to highlight the text. If you can, then you’re good to go.
The ease of getting text out of a PDF also depends on how it’s formatted, or rather how much formatting there is. If there’s a lot of fancy layouts and columns then you’re in trouble, as anyone who’s tried copying something from a PDF timetable will attest. But if it’s a standard flowing document where one line follows another then you’re fine. All you need to do is extract the text while retaining as much stylistic formatting as possible.
Even if you plan to avoid it like the plague, MS Word is the standard editor for formatted text so it makes sense to search for that. I found this tutorial on MacWorld where the ultimate aim is Word but the half-way point is an RTF file, which is perfect. RTF is Rich Text Format, a rarely used format which is just good enough for word processing and compatible with almost everything. This bit is Mac only but I’m sure Windows and Linux have similar things. In short, you use Automator.
Set this up, press Play, select the PDF and, boom, you’ve got a Rich Text version.
It’s important to remember that, while the conversion will do its best, it will rely on cues from the original document. If they’ve gone for some wacky cool formatting (which is another good reason to avoid the print and go digital) it may not map to standard document markup. But if there’s a logic to the layout you should find a nice hierarchical flow from Title to Headlines to Body.
(One downside of this process is all images are stripped out. This isn’t a huge problem for an academic text but they can be put back in later with a bit of work.)
Now, if you’re simply interested in the raw text you can stop here, or at least start tidying up the copy here. But I want an ePub file, one which works with book readers on my iPad and other mobile computing devices. So I’m turning to a slightly more powerful editor than Textedit, one which will allow me to export to ePub, in this case Apple’s Pages which, while not the most powerful editor causes less headaches than Word or OpenBloodyOffice.
The first thing to do is to deal with cruft from the PDF conversion. Here’s some of what I had to deal with.
- Section titles not marked up correctly. (Solution: manually select and mark up using correct Paragraph Styles.)
- Body text in various fonts and sizes. (Solution: manually select whole section and mark up as Body.)
- Header and footer text appearing mid-flow. (Solution: manual deletion)
- Hyphenation of words which shouldn’t be hyphenated. (Solution: search and replace.)
- No line space between paragraphs. (Solution: search and replace, then manually remove the excess.)
- Blockquote indentation vanished. (Solution: manually replace.)
- Italics vanished. (Solution: hell, I can deal without them.)
As mentioned, the images didn’t survive the conversion but in this case they weren’t critical so I ignored them. If they were I would have simply taken a screenshot of the PDF, cropped it to size and dragged it into the editor. Simple.
All in all the tidying of this 200 page book took about a hour. Was it worth it? I think so, because while I was working on the book I was also skim-reading it. I now have a pretty good general sense of what the sections of the book are about and know how to approach it.
This idea of comprehending something by working on it, no matter how mundane the work, is something I highly recommend. It’s no substitute for deep study but as a first pass it does wonders, particularly if you’re daunted by the thing in hand.
Now all I needed to do was export the document as an ePub and check it out in iBooks (a good default app for checking formatting, if not the best reader).
And there you have it. An electronic version of a book I own which I can now read enjoy and learn from without struggle. The book is a lovely object, and I’m glad I was gifted it as I wouldn’t have considered downloading the PDF and going through this process if I hadn’t had it in my hands, but I’m also glad I have the knowledge and wherewithall to create a digital copy. And now you do too.
Remember kids, always try and pay the people who make stuff you use, no matter how hard they make it to give them money.