main - code

epub-tools

about

epub-tools is a suite of command-line utilities for creating and manipulating epub book files. Included are: epubmeta, epubname, epubzip. This software uses the epub-metadata library, also available on Hackage.

epubmeta is a command-line utility for examining and editing epub book metadata. With it you can export, import and edit the raw OPF Package XML document for a given book. Or simply dump the metadata to stdout for viewing in a friendly format.

Here's an example of epubmeta output:

$ epubmeta Kelly_Kessel_Lethem-NinetyPercentOfEverything.epub

package
   version: 2.0
   unique-identifier: calibre_id
title: Ninety Percent of Everything
creator
   role: aut
   file-as: Kelly, James Patrick
   text: James Patrick Kelly
creator
   role: aut
   text: John Kessel
creator
   role: aut
   text: Jonathan Lethem
contributor
   role: bkp
   file-as: calibre
   text: calibre (0.5.1) [http://calibre.kovidgoyal.net]
date: 2001-03-25T00:00:00
identifier
   id: calibre_id
   scheme: calibre
   text: b1026732-69a5-4a05-a8d9-a1701685f6fa
identifier
   id: [WARNING: missing required id attribute]
   scheme: ISBN
   text: 1-590620-00-3
subject: Science Fiction/Fantasy
publisher: www.Fictionwise.com
language: en-us

epubname is a command-line utility for renaming epub ebook files based on their OPF Package metadata. It tries to use author names and title info to construct a sensible name.

Using it looks like this:

$ epubname poorly-named-book.epub

poorly-named-book.epub -> WattsPeter-Blindsight_2006.epub

$ epubname another-poorly-named-book.epub

another-poorly-named-book.epub -> Kelly_Kessel_Lethem-NinetyPercentOfEverything.epub

epubzip is a handy utility for zipping up the files that comprise an epub into an .epub zip file. Using the same technology as epubname, it can try to make a meaningful filename for the book.

The software in epub-tools is written in Haskell. It's known to build under GHC 7.0.3

news

2012-01-29 (v1.1.2)
2011-11-15 (v1.1.1)
2011-11-04 (v1.1.0)
2011-10-27 (v1.0.0.1)
2011-04-23 (v1.0.0.0)

documentation

Usage output for each of the programs:

Usage: epubmeta [OPTIONS] EPUBFILE
View or edit EPUB OPF package data

Options:
  -h        --help            This help text
  -e[SUF]   --edit-opf[=SUF]  Edit a book's OPF XML data in a text
editor. If an optional suffix is supplied, a backup of the book will
be created with that suffix
  -x[FILE]  --export[=FILE]   Export the book's OPF XML metadata to a
file. If no file name is given, the name of the file in the book will
be used
  -i FILE   --import=FILE     Import OPF metadata from the supplied
XML file
  -v        --verbose         Display all OPF package info, including
manifest, spine and guide

When -v or no options are given, epubmeta will display the OPF package
data in a human-readable form.

The -e feature will look for an editor in this order: the EDITOR env var,
the VISUAL env var, /usr/bin/vi

Version 1.1.2  Dino Morelli <dino@ui3.info>

---

Usage: epubname [OPTIONS] FILES
Rename EPUB book files with clear names based on their metadata

Options:
  -d         --any-date         If no publication year found, use first 
                                date element of any kind
  -D         --no-date          Suppress inclusion of original publication 
                                year
  -h         --help             This help text
  -n         --no-action        Display what would be done, but do nothing
  -o         --overwrite        Overwrite existing file with new name
  -p         --publisher        Include book publisher if present. See below
  -v[LEVEL]  --verbose[=LEVEL]  Verbosity level: 1, 2

Verbosity levels:
   1 - Include which formatter processed the file
   2 - Include the OPF Package and Metadata info

Exit codes:
   0 - success
   1 - bad options
   2 - failed to process one or more of the files given

Book names are constructed by examining parts of the OPF Package metadata
such as the title, creators, contributors and dates.

Strings from the OPF metadata fields are stripped of punctuation,
CamelCased and stripped of spaces. Resulting file names look like this:

For books with a single author:
   LastFirst[Middle]-TitleText[_year][_publisher].epub

For books with multiple authors:
   Last1_Last2[_Last3...]-TitleText[_year][_publisher].epub

For books that have no clear authors, such as compilations:
   TitleText[_year][_publisher].epub

Only creator tags with either a role attribute of 'aut' or no role at all
are considered authors. If a file-as attribute is present, this will be
the preferred string. If not, the program tries to do some intelligent
parsing of the name.

The OPF spec suggests there may be a <dc:date
opf:event='publication'>2011</date> element representing
original publication date. If this (or opf:event='original-publication')
is present, it will be used by default for _year as in the above
examples. The --any-date switch will use the first date tag found,
regardless of attributes. The year can be parsed out of many date formats,
it's very flexible.

Publisher: I wanted to provide a way to have multiple copies of the
same book produced by different publishers and name them sort-of
unambiguously. I came up with the idea of expecting a contributor tag
with role attribute of 'bkp' (so far, this is fairly normal). And then
use a file-as attribute on that tag to contain a small string to be used
in the filename. The idea here is short and sweet for the file-as.

Magazines are kind of a sticky problem in that it's often desireable to
have edition and/or date info in the filename. There's a lot of chaos
out there with titling the epub editions of magazines. The solution
in this software is to do some pattern matching on multiple fields in
the magazine's metadata combined with custom naming code for specific
magazines. This means that support for future magazines will likely have
to be hand-coded into future versions of this utility. Modifying this
just isn't very non-programmer friendly.

For more information please see the IDPF OPF specification found here:
http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm

Version 1.1.2  Dino Morelli <dino@ui3.info>

---

Usage: epubzip [OPTIONS] DESTDIR
       epubzip [OPTIONS] DESTDIR/EPUBFILE
Construct an epub zip archive from files in the current directory

Options:
  -h  --help       This help text
  -o  --overwrite  Force overwrite if dest file exists

If run with DESTDIR alone, epubzip will try to construct a name
from the OPF package data for this book (see epubname). If run with
DESTDIR/EPUBFILE, epubzip will use that name for the destination file.

You may have noticed that there is no epubunzip utility. Truth is,
epubs are just zip files and you barely need epubzip either if you have
the normal zip/unzip utilities installed. While not as fancy with file
naming and leaving out dotfiles, this works for zipping:

   $ cd DIR
   $ zip -Xr9D ../EPUBFILE mimetype *

And for unzipping, it's really just as easy:

   $ mkdir TEMPDIR
   $ cd TEMPDIR
   $ unzip EPUBFILE

Version 1.1.2  Dino Morelli <dino@ui3.info>

getting it



last modified 2012-01-29