epub-tools is a suite of command-line utilities for creating and manipulating epub book files. Included are: epubmeta, epubname, epubzip. This software uses the epub-metadata library, also available on Hackage.
epubmeta is a command-line utility for examining and editing epub book metadata. With it you can export, import and edit the raw OPF Package XML document for a given book. Or simply dump the metadata to stdout for viewing in a friendly format.
Here's an example of epubmeta output:
$ epubmeta Kelly_Kessel_Lethem-NinetyPercentOfEverything.epub package version: 2.0 unique-identifier: calibre_id title: Ninety Percent of Everything creator role: aut file-as: Kelly, James Patrick text: James Patrick Kelly creator role: aut text: John Kessel creator role: aut text: Jonathan Lethem contributor role: bkp file-as: calibre text: calibre (0.5.1) [http://calibre.kovidgoyal.net] date: 2001-03-25T00:00:00 identifier id: calibre_id scheme: calibre text: b1026732-69a5-4a05-a8d9-a1701685f6fa identifier id: [WARNING: missing required id attribute] scheme: ISBN text: 1-590620-00-3 subject: Science Fiction/Fantasy publisher: www.Fictionwise.com language: en-us
epubname is a command-line utility for renaming epub ebook files based on their OPF Package metadata. It tries to use author names and title info to construct a sensible name.
Using it looks like this:
$ epubname poorly-named-book.epub poorly-named-book.epub -> WattsPeter-Blindsight_2006.epub $ epubname another-poorly-named-book.epub another-poorly-named-book.epub -> Kelly_Kessel_Lethem-NinetyPercentOfEverything.epub
epubzip is a handy utility for zipping up the files that comprise an epub into an .epub zip file. Using the same technology as epubname, it can try to make a meaningful filename for the book.
The software in epub-tools is written in Haskell. It's known to build under GHC 7.0.3
Usage output for each of the programs:
Usage: epubmeta [OPTIONS] EPUBFILE View or edit EPUB OPF package data Options: -h --help This help text -e[SUF] --edit-opf[=SUF] Edit a book's OPF XML data in a text editor. If an optional suffix is supplied, a backup of the book will be created with that suffix -x[FILE] --export[=FILE] Export the book's OPF XML metadata to a file. If no file name is given, the name of the file in the book will be used -i FILE --import=FILE Import OPF metadata from the supplied XML file -v --verbose Display all OPF package info, including manifest, spine and guide When -v or no options are given, epubmeta will display the OPF package data in a human-readable form. The -e feature will look for an editor in this order: the EDITOR env var, the VISUAL env var, /usr/bin/vi Version 1.1.2 Dino Morelli <dino@ui3.info>
---
Usage: epubname [OPTIONS] FILES
Rename EPUB book files with clear names based on their metadata
Options:
-d --any-date If no publication year found, use first
date element of any kind
-D --no-date Suppress inclusion of original publication
year
-h --help This help text
-n --no-action Display what would be done, but do nothing
-o --overwrite Overwrite existing file with new name
-p --publisher Include book publisher if present. See below
-v[LEVEL] --verbose[=LEVEL] Verbosity level: 1, 2
Verbosity levels:
1 - Include which formatter processed the file
2 - Include the OPF Package and Metadata info
Exit codes:
0 - success
1 - bad options
2 - failed to process one or more of the files given
Book names are constructed by examining parts of the OPF Package metadata
such as the title, creators, contributors and dates.
Strings from the OPF metadata fields are stripped of punctuation,
CamelCased and stripped of spaces. Resulting file names look like this:
For books with a single author:
LastFirst[Middle]-TitleText[_year][_publisher].epub
For books with multiple authors:
Last1_Last2[_Last3...]-TitleText[_year][_publisher].epub
For books that have no clear authors, such as compilations:
TitleText[_year][_publisher].epub
Only creator tags with either a role attribute of 'aut' or no role at all
are considered authors. If a file-as attribute is present, this will be
the preferred string. If not, the program tries to do some intelligent
parsing of the name.
The OPF spec suggests there may be a <dc:date
opf:event='publication'>2011</date> element representing
original publication date. If this (or opf:event='original-publication')
is present, it will be used by default for _year as in the above
examples. The --any-date switch will use the first date tag found,
regardless of attributes. The year can be parsed out of many date formats,
it's very flexible.
Publisher: I wanted to provide a way to have multiple copies of the
same book produced by different publishers and name them sort-of
unambiguously. I came up with the idea of expecting a contributor tag
with role attribute of 'bkp' (so far, this is fairly normal). And then
use a file-as attribute on that tag to contain a small string to be used
in the filename. The idea here is short and sweet for the file-as.
Magazines are kind of a sticky problem in that it's often desireable to
have edition and/or date info in the filename. There's a lot of chaos
out there with titling the epub editions of magazines. The solution
in this software is to do some pattern matching on multiple fields in
the magazine's metadata combined with custom naming code for specific
magazines. This means that support for future magazines will likely have
to be hand-coded into future versions of this utility. Modifying this
just isn't very non-programmer friendly.
For more information please see the IDPF OPF specification found here:
http://idpf.org/epub/20/spec/OPF_2.0.1_draft.htm
Version 1.1.2 Dino Morelli <dino@ui3.info>
---
Usage: epubzip [OPTIONS] DESTDIR
epubzip [OPTIONS] DESTDIR/EPUBFILE
Construct an epub zip archive from files in the current directory
Options:
-h --help This help text
-o --overwrite Force overwrite if dest file exists
If run with DESTDIR alone, epubzip will try to construct a name
from the OPF package data for this book (see epubname). If run with
DESTDIR/EPUBFILE, epubzip will use that name for the destination file.
You may have noticed that there is no epubunzip utility. Truth is,
epubs are just zip files and you barely need epubzip either if you have
the normal zip/unzip utilities installed. While not as fancy with file
naming and leaving out dotfiles, this works for zipping:
$ cd DIR
$ zip -Xr9D ../EPUBFILE mimetype *
And for unzipping, it's really just as easy:
$ mkdir TEMPDIR
$ cd TEMPDIR
$ unzip EPUBFILE
Version 1.1.2 Dino Morelli <dino@ui3.info>
last modified 2012-01-29