Akamai Diversity
Home > Media Delivery > CMAF: What It Is and Why It May Change Your OTT Future

CMAF: What It Is and Why It May Change Your OTT Future

By Will Law and Shawn Michels

Apple's June 15th announcement at its Worldwide Developers Conference that it will add fragmented MP4 (fMP4) support to HLS marks a significant step in simplifying online video streaming. fMP4 is the parent of the emerging Common Media Application Format (CMAF), and Apple's plan to support fMP4 brings the industry closer to the single format for OTT distributors and playback support on all consumer electronics devices. The ultimate goal is to reduce the complexity when delivering video online. 

The OTT industry has made a wholesale shift over the past five years from using proprietary media protocols such as RTMP, MMS and RTP, towards using HTTP/S to deliver adaptive segmented content to viewers. Within the adaptive segmented formats, there is still significant fragmentation, with HLS, Smooth, HDS and MPEG DASH offering competing solutions. Even with the expected deprecation of Smooth and HDS and their replacement with DASH, most content distributors are still faced with making two silos of content - one in HLS and another in DASH.

Today, HLS specifies the use of TS (transport stream) file containers, while DASH, although allowing TS, almost uniformly uses ISO Base Media File Format (ISOBMFF) in practice, in particular a variant known as the aforementioned fragmented mp4. The result is that content distributors wanting to reach a HLS and DASH audience must encode and store the same audio and video data twice - once wrapped in TS containers and then again wrapped in ISOBMFF. These same files, although representing the same content, cost twice as much to package, twice as much to store on origin and compete with each other on Akamai edge caches for space, thereby reducing the efficiency with which they can be delivered.

To try to overcome the cache efficiency problem, the market has launched a myriad of solutions which require complex synthesis of the TS and ISO segments (for HDS) at the edge or in a streaming mid tier. These servers, which have to build content before they can deliver it, have a lower throughput than those that can simply pass it through. File container diversity therefore limits the total throughput achievable by a delivery server, as well as contributing significantly to our customer's content preparation, workflow management, and delivery costs.  Alternatively, customers could store multiple versions of the content which impact total storage costs.

In mid 2015 two unlikely collaborators - Microsoft and Apple - came together to plan an end to this inefficiency through a new media file format, which at the time was called Common Media File Format but which is now CMAF. Microsoft and Apple reached out to Akamai and a number of their closer partners to iterate on the proposal. In February 2016 this group of companies prepared a joint submission to MPEG, which has been accepted onto a standardization track.

CMAF has a number of attributes that are of interest to the media industry:

  • It is an ISOBMFF, fMP4 container, specifically ISO/IEC 14496-12:201. Transport Streams have served the purpose of the broadcast and cable industries well in delivering continuous streams of data, but they are ill-suited to segmented media delivery and switching, incurring higher overhead/payload ratios than fmp4. fmp4 is extensible for future additions, lightweight, already used by DASH, Smooth and HDS and is an ISO standard with a robust toolset around it for encoding, manipulation, debugging and analysis.
  • Common Encryption (CENC) - ISO/IEC 23001-7: 2016 - a standard means of encrypting media content payload using AES-128bit encryption and then supplying header information so that multiple concurrent DRM systems can be used to decrypt the content. This prevents separate silos of content being needed to support the myriad of DRM solutions available today which, unfortunately, are not converging at the same rate as the file containers.
  • Will support the MPEG codec suite of AVC (ISO/IEC 14496-10), AAC (ISO/IEC 14496-3) and HEVC (ISO/IEC 23008-2) codecs in a baseline interoperability but allow other codecs (such as VP9 or multichannel audio) to be signaled.
  • Currently allows two types of captioning/subtitling formats - WebVTT and IMSC-1.
  • Segments must begin with keyframes and there must be precise segment alignment across bitrates. This simplifies switching between bitrates for players.
  • Requires independent (non-muxed) audio and video segments.
  • A low latency mode is offered, which should further help reduce OTT live stream latencies below the thresholds for terrestrial and satellite broadcast distribution.
  • Is designed to be referenced by both a HLS playlist (.m3u8) and a DASH manifest (.mpd). 

CMAF is very similar to the file container that DASH already uses today, so adopting CMAF from the DASH perspective requires little, if any, change to encoders, workflow or players. For the Apple and HLS community however, it requires parsing a new type of container. Apple's announcement to support fMP4 in HLS under iOS10, macOS and tvOS gives the industry more confidence that CMAF will live up to its billing as the driver of convergence.  

The advent of CMAF heralds the beginning of the end for TS containers for OTT delivery. The benefits of encoding once, packaging once, caching once and building a single type of player are too attractive along the delivery chain for TS to persist in the long term.

It is not all ice-cream cake however. Even though Apple, Android and Microsoft operating systems and devices will quickly support CMAF, there will still be many legacy devices that are non field-upgradeable for which TS-based HLS will continue to be needed.

Additionally, Common Encryption is not as common as one might think. There are actually several cipher block modes allowed by the spec - CTR versus CBC. While draft CMAF continues to support both of these, the vision of a single content set for all devices remains blurry. CMAF also does not solve the problem of manifest fragmentation, as both HLS m3u8 manifests and DASH .mpd manifests will still need to be generated.

Despite these issues, CMAF remains the biggest step forward the industry has taken in many years towards a harmonized and converged future. We can expect market forces to pick winners (for codecs, captions, encryption modes and presentations formats) and CMAF to settle quickly to be the de-facto OTT media standard.

Akamai has been committed to CMAF from its instantiation and is actively working to ensure CMAF support is a first-class citizen of both our Media Services On Demand and Media Services Live products.

 Want more information on CMAF?

Will Law is Chief Architect for Akamai's Media Division. Shawn Michels is a senior product manager for media at Akamai.


Thank you for laying this out really well done. Any chance we can get a common DRM key exchange to go along with the encryption? Supporting a matrix of clients and DRM servers is not ideal. Will also need a common watermarking mechanism as these are now both requirements for premium 4K content.

Having a common client-side key exchange mechanism among the various DRMs has been requested by many player implementers, however the DRM vendors regard their key exchange mechanisms as proprietary and giving them some market advantage, and so there is been little interest from their side in harmonizing that as was done for Common Encryption. Watermarking is in a similar situation. Having a common implementation means commoditization of the service, which the various vendors would like to avoid. They consider their implementation mechanisms a market advantage. Additionally, dynamic session-based server-side watermarking is still in its infancy as regards deployment. I have no doubt that the UHD content protection requirements will drive more deployments. It will be a few years until watermarking is ubiquitous enough that we can hope for a common water-marking mechanism to be developed.

Leave a comment