Adding Audio and Video Support to your EPUB Apps

Adding Audio and Video Support to your EPUB Apps

Datalogics has just released video support for Adobe Reader Mobile SDK (RMSDK) on the iOS platform. This is a popular feature request from our customers who are gradually adopting the new EPUB 3 platform. Datalogics has taken on the challenge of providing better EPUB 3 support (for more information about our long-term EPUB3 strategy, please read this post). In the interim, we have created RMSDK APIs to support basic playback of audio/video content embedded in EPUB files. By using these APIs, we are now able to play videos in DL Reader for iOS. This is a stepping stone for our customers who want to begin producing A/V content for EPUB files. The purpose of this article is to help you understand the scope of work for providing this type of support and how to use the new APIs to add A/V support in your readers on other platforms.

Our initial rollout of A/V support includes:

  • Rendering engine changes in RMSDK
  • APIs to expose A/V elements from RMSDK in RMServices middle layer
  • A new release of DL Reader client application on iOS that can play video content

This article will focus on video APIs with the goal of explaining the ideas and the APIs that a developer can use to add A/V support. At times, you will see that I am using A/V together since both follow similar implementation. Datalogics will release similar support for audio-only elements at the end of this month. We will follow up with another article to explain the audio-only APIs then. So let’s get started with the video APIs.

1.  Main Idea

The main idea behind adding video support to the existing RMSDK’s EPUB engine is to utilize the rendering ability in the RMSDK to create a placeholder for the video element. The video player control resides on the application side which is platform dependent. It is the application’s responsibility to create video controls and player on top of the rendered bitmap returned from the RMSDK. The application asks the RMSDK for the video’s position and size data, and then uses this information to position the video player. The RMSDK will also return the video content data which the player control uses to play the video. The new release of RMSDK can recognize video tags in EPUB markups and return video content stream as well as the position, size, and file URL information for developers to set up playback controls.

Currently we support three common video formats: MP4, WebM and M4V.

2.  API details

Structure dpdoc::VideoInfo is used to store the video information.

struct dpdoc::VideoInfo
int x;      /* x-coordinate of video location on the screen */
int y;      /* y-coordinate of video location on the screen */
int width;       /* width of the video */
int height;      /* height of the video */
dp::String url;  /* path to the source video file */

virtual int getVideoCountForCurrentScreen();

This function returns the number of video elements that reside on the current screen being rendered by the RMSDK. This information is available to the caller after the screen has been rendered by the RMSDK.

virtual bool getVideoInfoForVideoOnCurrentScreen(int videoIndex, dpdoc::VideoInfo * info);

This returns dpdoc::VideoInfo for the video at given videoIndex on the current screen. The caller can use this info to properly place the video on the screen.

virtual dp::String getVideoInfoForCurrentScreenAsJSON();

getVideoInfoForCurrentScreenAsJSON is similar to getVideoInfoForVideoOnCurrentScreen but instead returns video info for all videos on the current screen as a list of JSON objects. Each object in the list represents an individual video on the current screen that was just rendered. The data field of each object contains another JSON object with video data. An example of the returned list:

            "url":" file:///C:/Test/videoTests1.epub/OEBPS/fox.m4v",

virtual dpio::Stream* getVideoStream (dp::String url);

getVideoStream returns the video file as dpio::Stream from the given url. The URL is taken from the retrieved VideoInfo from calls to either getVideoInfoForVideoOnCurrentScreen or getVideoInfoForCurrentScreenAsJSON. Video stream is read into the buffer synchronously.

3.  General Workflow

Knowing these APIs, let’s talk about the general workflow. Each reader can implement A/V support using the API described above. Many components are platform dependent. For example, the A/V player controls are different on different platforms, and even on the same platform one can either choose a native player or plug in some custom player controls. However, the following basic workflow can be followed in all cases.

3.1  Start rendering a screen

The client application calls the RMSDK to render a new screen. This should have occurred in your current application already.

3.2  Post rendering

After the client application determines that the screen has finished rendering, the client application can call the RMSDK API getVideoInfoForCurrentScreenAsJSON or getVideoCountForCurrentScreen and getVideoInfoForVideoOnCurrentScreen to retrieve position and size information for the videos on the screen that has just been rendered.

Note: RMSDK will return only the video info for videos to be played on the current screen.

3.3  Create A/V player controls at the given video positions

Once the rendered screen is displayed, the client application can use the video controls in the correct locations on the screen. It is up to the client application to determine what types of video player control to use.

Note that the video position information returned by the RMSDK is affected by the environment matrix set on the RMSDK’s dpdoc::Renderer. If your application modifies this environment matrix when rendering the screen (for example, DL Reader iOS modifies the environment matrix to achieve the correct scaling across both Retina and non-Retina devices), you’ll need to do the following to get correct video position information:

1. Save a copy of the environment matrix used for rendering the screen.
2. Invert the saved environment matrix.
3. Apply the inverted environment matrix to the video position coordinates and width/height received from the RMSDK.

3.4  Use the video URL to get video data for playback

The file URL to the video in the EPUB is one piece of information returned by the RMSDK video info calls, stored either in dpdoc::VideoInfo or passed back in the JSON objects. Since the video is embedded in the EPUB file, the client application can call getVideoStream to ask the RMSDK for the video content. Given the current design in the RMSDK, the client application does not have direct access to the EPUB, rather, the file content needs to be retrieved from the RMSDK.

The API returns the complete file in one buffer, which can be either saved to a file or passed to the video player for playback depending on the capabilities of the application platform. In DL Reader iOS, each video is saved to a file on the local disk and video playback occurs from the files saved on the disk.

While implementation can vary greatly depending on client platforms and for different client application designs, these are the key steps to follow when implementing A/V support by using the existing RMSDK. DL Reader for iOS contains a complete implementation of the video player using these APIs. If you have licensed DL Reader source for iOS, you will be able to use DL Reader’s implementation as a great source of reference.

4.  What is supported

Now that we’ve covered the API additions in the RMSDK, let’s get back to the basic question: what do we support precisely in the RMSDK and DL Reader for iOS? The following sections will cover what is supported divided into the following categories:

  • video tag markup
  • video format: MP4, WebM, and M4V
  • video dimensions
  • video rendering
  • and lastly, video playing

As an example, you can see how DL Reader for iOS implemented the video support by downloading and using the app from the App Store.

4.1  Video tags in EPUB

Videos are supported by adding video elements in the EPUB data. We support a subset of video attributes as defined for HTML 5 and EPUB 3. A sample markup can look like:

<video width=”320″ height=”240″ poster=”.imagesvideo1_cover.jpg”>
mp4″ type=”video/mp4″/>
<source src=”video.webm” type=”video/ogg”/>
Your browser does not support the video tag.

The RMSDK recognizes video element as specified by the tag. Only a subset of video attributes is supported (see tables below). As an example, a markup without width and height specified, and with only one video source can be much simpler:

<video poster=”.imagesvideo1_cover.jpg”>
mp4″ type=”video/mp4″/>
Your browser does not support the video tag.

The following tables list a subset of video attributes that the RMSDK supports. The full spec can be referenced from HTML 5 and EPUB 3 websites.

Table 1. Attributes in the video element

Attributes Purpose Supported Values
src URL of the video file Yes URL of the embedded video, using the child element is preferred.
poster Display an image for the placeholder Yes The specified image is rendered by RMSDK on the current screen at the position where the video should be placed.
width Width Yes
height Height Yes

Table 2. Attributes in the source element

Attributes Purpose Supported Values
src URL of the video file, new in HTML5 Yes URL of the embedded video
type Specify MIME types Yes See the next table.

4.2  Supported video format

Table 3. Supported video formats and their MIME types

Type File Extension MIME Type
MP4 .mp4 video/mp4
WebM .webm video/webm
M4V .m4v video/m4v

MP4 and WebM are two common video formats. M4V is developed by Apple and used by iTunes, and it is similar to MP4 but has additional copyright protections.

The RMSDK returns the first supported video format if multiple options are provided in the source elements.

The RMSDK requires that the videos be embedded (contained) in the EPUB files. Externally streamed video is not supported.

4.3   Video dimensions

By now you may be wondering how we determine the size of a video element. If the width and height are specified in absolute values (rather than percentages), they will assume the default unit of points (72 points = 1 inch in the RMSDK). The specified width and height are honored. If the size is larger than the space available for displaying, it is up to the client application to determine what to do with the video player control. Note that, if poster image is used, the image will be rendered by using the specified size.

If neither width nor height is specified, the default is 320×180. This is chosen so that the default size can fit on the portrait mode display on an iPhone. The size also assumes a 16:9 aspect ratio. Note that this size can be found in the RMSDK and modified by RMSDK developers.

If width is specified as 100% or some other percentage, the height is determined by an aspect ratio of 16:9 after the RMSDK discovers the available width. This can get a bit tricky because the height is not well defined. So for video content, it is up to the client application to make sure that controls can be seen and function even if the returned video sizes are not optimal. For the same reason, we strongly recommend that you either try to define both the width and the height dimensions, or use the default values, by not defining either dimensions.

4.4  Video rendering

Video content is not “rendered” or processed by the RMSDK. The RMSDK merely passes its file URL back to the client application so it is up to the client application to play the video content. The client application decides how to display the video control. The client application can also use the video data to determine the initial image displayed.

The RMSDK honors poster image attributes for video. The image is rendered similarly as a regular image and is displayed on the screen. However, the video player controls will likely cover the poster image. The current version of RMSDK does not return the poster image separately to the client application.

4.5  Video playing in DL Reader for iOS

The DL Reader on iOS will overlay a video control at the position returned by the RMSDK. The video will keep its aspect ratio when displayed by the video player. Here is a look at what DL Reader for iOS can display when a screen contains a video clip.


When the screen orientation changes, the screen is re-rendered and the video may appear in a different position or may even move off of the page. You can check out the DL Reader for iOS from the App Store. The DL Reader’s implementation is for reference. Many aspects of the user experience can be implemented differently in the client application.

5  Testing with book2png

As you have seen, the APIs from the RMSDK are straightforward. We have added two book2png commands to show the outputs from these new APIs from the RMSDK. Both commands return video info for videos on the current rendered page. Note that use of the word “page” here vs. the word “screen”. If you are familiar with book2png or look up book2png options, you will know that the word “page” is being used. This “page” translates to the “screen” we have used throughout this article. The reason that we are specific about the use of the word “screen” in client applications is because a page tends to mean a physical printed page for human readers which can span more than one screen depending on the font size and display size. When a reader requests a page from RMSDK, it really is requesting a screen. So a page in the traditional sense does not map one-to-one to a screen in an e-reader application.

The book2png command for retrieving video parameters as JSON string and writing the string to a file in the local folder is:

book2png –video-info-json

The book2png command for retrieving video content stream and writing it to a file in the local folder.

book2png –write-video-to-file

6  Conclusion

As you know, that DL Reader is our reference RMSDK – based viewer. There is a reference implementation in DL Reader for iOS, if you choose to implement video in your own reader for iOS or for other platforms. We hope this article has covered the basics of how video API works and has prepared you with a good start.

As mentioned above, Datalogics is working on releasing similar APIs for audio. You can find audio APIs in the most recent release of RMSDK. The DL Reader for iOS is planned to be released soon that will support basic audio playbacks. The idea is the same and I don’t expect that the algorithm will differ much. Even so, I will be sure to send out a blog on the audio API once we have wrapped up the last bit of work on it.

If you have questions or comments, please email

4 thoughts on “Adding Audio and Video Support to your EPUB Apps

  1. Hello,
    Can you tell me if ePub3 supports any size video such as cinemascope super-widescreen 2.35:1? If I create an MP4 with the super-wide spec will the ebook play it properly or is ePub3 limited to 16:9?
    Thank you!!!

Leave a Reply

Your email address will not be published.