Creation of e-books

As a Lean Six Sigma professional, I have always been assigned tasks that are not really about my primary job but the boss just wanted it done right and on-time.  As you may know, Lean Six Sigma practitioners have a history of accomplishing tasks that they knew nothing about before the project began.  Now it is creating e-books from existing books.

Smarter Solutions and Forrest Breyfogle have decided to create electronic versions of our textbooks used in our training classes.   We have not yet decided how to use these books, but we have started creating them, so I wanted to write about this journey in case any of you have the urge or assignment for yourself.

We started with copies of the books in the publishing format used for the books.  It was an older formatted file from the Adobe In-Design suite of products.  Not very readable by any other software.  Since we could not really use those files, we found the PDF proof files that were used for printing.  These PDF files were formatted for printing which means that the book pages were centered in a letter size page.

What formats are used for e-books?

We decided to provide two primary e-book formats; .epub and .mobi.  The .epub format is used for nearly every e-reader except the kindle devices.  The .mobi format is usable on a kindle.  If you want to read these on a PC or Mac you need to use special software.

The .epub files are where we are starting.  It turns out that these files are actually zipped files of html, images, and formatting files.  They are sort of like self-contained  mini-websites.  The editing of the files can be performed with any simple text editor, if you can work in the html format, or a simple html editor like the one used by wordpress.  You do not have the ability to code with the complexity found in websites but you do have a good bit of control.

We did find a software for e-book management that has a good conversion function; Calibre.  This software provides a book management system along with an easy to use conversion process.  It is not perfect, but it is quite good.  This allows us to build the .epub version and then create the .mobi version.

Calibre ebook management  sigil

.epub file editing

After we used Calibre to convert our pdf files into an .epub format, we needed to begin editing and cleaning up the files.  The editor in Calibre provides an ability to perform the work, but we found it to be clumsy.  This brought us to the software Sigil to perform the editing.  This software provides a fully functional .epub editor that includes a spell checker but also a “Flight Check” function that scans the .epub file for non-standard code and errors.

Graphic handling

As you can imagine, our books have a lot of graphics and tables.  This hits the primary limitation of e-book readers, the inability to manage graphics well.  Nearly every e-book reader allows the user to change font sizes and contrast to allow them to read the text well, but there is no standard method to handle graphics.  Graphics are shown in the pixel size that they are loaded into the e-book file.  Since e-book readers have many different screen sizes, there is no real optimal graphic size that works on all readers.  The best recommendation is that no graphic have a width greater than 600 pixels.  This size is not too small on the full size readers and PCs and is not so big that it cannot be understood on a phone size screen.   The graphic handling is a big reason a lot of text books are not as welcome in an e-book format, and in our books we have a number of graphics that are rotated full page images that are close to unreadable when shrunk to 600 pixels.

A simple fix would be to create links from the e-book to full size graphics on the web or other source in the book file.  We have not found a standard on how to deal with this.  It turns out that we can insert links to web sources for graphics and such, but these links fail the “Flight check.”   This failure is because there is a wish that all books stand alone and do not provide a method to insert malware or virus software onto the reader.  We have found that Apple will not accept any ebook that has components in the .epub book that is not view-able or used in the book.  This may be a similar issue to links outside of the book.

Book Creation Process

Our book creation process is as follows”

  1. Convert the .pdf e-book into an .epub format using Calibre.  (we have converted MS Word files too)
  2. Use Calibre to adjust all graphics to portrait orientation and a maximum 600 pixel width.
  3. Open the .epub file in Sigil and segment the book into file segments that match the book chapters.The Calibre conversion may or may not segment the book into chapters.  It is a best practice to have every book section to be created as a separate file in the .epub zip file.  This segmentation is performed by scanning through the book and splitting files at times and joining files in other times.  This takes a bit of time, but it really will help later.
  4. Now open each file and begin editing to remove conversion issues.
    1. Remove all of the page headers and footers that were included in the text due to the conversion.
    2. For some reason every word with a li in it was converted to have a space after the i. We also found cases where lo and ll also had a space inserted.  These were all removed with the copy/replace function in sigil.
    3. Many sentences that began on the left margin of a page but were still in the middle of a paragraph were converted into new paragraphs and had to be fixed.
    4. Remove the hyphen from all words that were hyphenated at the end of a line in the original book.
    5. Join all of the paragraphs split at the original page breaks.
    6. Convert all of the bullet points from characters into html formatted bullets to make sure the line wrapping looks right.
    7. Convert all of the numbered lists from characters to html formatted numbered lists for the line wrapping.
  5. Then go back to the beginning of each .epub file and compare it to the original book or the PDF and make them match.  While scrolling through the file:
    1. Check every figure (graphic) or table for readability.  Shrink graphics that look too large or replace the graphic with a new copy that you created using the original graphic used to me the original book.I used MSPaint as the Sigil graphic editor to change size of the graphics.  It saves the file right back into the .epub file.If the graphic was unusable or missing from the .epub file, I used MS PowerPoint (PPT) to create the graphic to be loaded into the .epub file.  I would copyor insert the original graphic into PowerPoint.  Then copy the graphic and paste-special back into PPT as an “enhanced metafile.”  Adjust the enhanced meta file to the dimensions that are close to the 600 pixel width (a bit less than half a slide width).  This format seems to provide an ability to change its dimensions without making it blurry.  Now, right-click on the graphic and choose save as a picture.  Choose the .png format and save the graphic.  I do not use the .jpg format because I have had problems with backgrounds turning black at times that I do not understand.Now import the graphic into the book.  Check the pixel width and then adjust it as necessary.
    2. Check all equations that are inline with the text (the conversion misses superscripts and subscripts quite often)  and many specialized math symbols did not convert well.  I used MS Equation editor to recreate most equations and then inserted them as a graphic rather than try to force the e-book html to create the equations.
  6. When this is all done, use the spell check and the flight check to review for the missed issues.

Other Lessons Learned

We found references that a book file needs to be less than 50mb for amazon and less than 2gb for apple, but either way you do not want a large file size if possible.  Large graphics can make the book file size large.  You can change an .epub file to a .zip file by editing the suffix in windows explorer.  Then sort the image directory by size.  You may want to reduce the resolution or dimensions of large image files to save space.

Special formatting and consistent formatting in an e-book is maintained through the use of the CSS file, just like in a website.  rather than using a lot of format commands in the book files, adjust the style sheet to create the formats you want to use.  This saves space and makes the editing easier.  Now editing the style sheet is not a simple task, but a web programmer can help you understand it.

That is it for now.  If I lean more I will post it too.