So, what is the BOCC?
Simply put, Akamai runs a state-of-the-art Broadcast Operations Control Centre, the BOCC, to help ensure smooth and seamless end-user play-back experience for live OTT (Over the Top) and linear video delivered through Akamai Media Delivery Solutions. To phrase it more technically - the BOCC is a support service providing effective 24x7x365 real-time, high-density monitoring towards ensuring lower latency, continued availability, and buffering-free playback of video streams covering most stages of over-the-top streaming workflow.
With an increase in internet-based video consumption, especially among the mobile audience and OTT adoption on connected TVs, the state-of-quality-and-latency demand around streaming experiences is challenged to improve. Accordingly, many changes and improvisations are seen in streaming technologies and associated performance expectations, including the shift to HTTP-based adaptive streaming, media security, CMAF for standardization and chunked encoding/transfer to achieve lowest hand-waving latency, ad-insertion methodologies, and more.
Given these, an effective operations and monitoring service like BOCC is only as effective as its hold on 3 key areas:
- The Streaming technology
- The Monitoring technology
- The People, subject matter experts, on both streaming and monitoring
A focused positioning and organic evolution since inception have helped Akamai build a talented resource base with expertise on core streaming technologies and video distribution over the internet. Through the BOCC, this expertise is made available as a service. The BOCC has at its disposal an entire suite of supporting systems and tools to enable proactive monitoring and in-time diagnosis of live and linear streaming issues and their appropriate remediation. The key tenets of monitoring and diagnosis included as part of the BOCC are:
- Data that matters
- Tools and analytics solutions that derive valuable inference from data
- Timely inference when it matters
Here is a summary of the data and analytics tools available through the BOCC. Also indicated is how Akamai Media Delivery products such as live/linear stream ingest, delivery and playback map to the simplified streaming workflow while being the space of operations for the BOCC.
Data that Matters
The BOCC has access to key metrics from all of the layers feeding into the streaming workflow. This includes:
- First Mile (Stream Origin for 3rd Party Origins, Ingest and Mid-Tiers for Akamai Media Services Live, or Live Origin)
- Middle Mile (Tiered Akamai Intelligent Network Cache and Edge Servers)
- Last Mile (Client/Player perspective, if Customer has Akamai Media Analytics integrated into video players)
There are more than 100 metrics provided to help understand the state of the workflow. Here is a sample of the nature of metrics:
- Playback errors, startup time, bitrate, buffering rate and duration
- Edge errors, request processing time, bit-transfer time, bit-throughput
- Mid-mile errors, cache hit-rate
- Origin errors, content availability, throughput, DNS aspects
- Hop efficiency, last hop/origin distance
- Server availability, CPU/memory utilization, network aspects
Metrics are gathered to allow slicing by a wide set of visibility dimensions including geography, network, server hosts/roles, player identity, device type/OS, video format, and beyond.
Tools and Solutions
Data is only as useful as the inference it provides. The BOCC has access to many tools that work on data and help get visibility into the location of the issues within the streaming workflow, the specific nature of the issues, and help explain why. The ability to perform root-cause analysis on an observation or issue and pin down the points of recovery is what enables the BOCC to be an effective stream monitoring and remediation operations service.
Most of the BOCC tools are centered on data analytics. This happens by formulating a set of hypotheses and then validating/eliminating them by data driven reasoning. Here is a sample of the nature of the tools:
- Client Assessment
- How is a stream doing overall on KPI metrics? On the curve, or deviating?
- For example, is startup time high or in check.
- Impact Scoping
- Is an observed issue localized to a geography/network/server/stream/ ... or spread widely?
- If localized, where exactly? For example, which country or networks.
- Casual Isolation
- Ability to do one-vs-global, one-vs-other relative comparison for pinning cause.
- For example, compare buffering for users across two networks, a network against global average.
- Sampling Significance
- Qualify observations with associated sample quantum; avoid mis-leading signals.
- Attribute relevance by sample size before drawing inference.
- Co-Relation across metrics from same/multiple data sources
- Finding relative crests/troughs that confirm/invalidate a hypothesis.
- For example, high buffering and lower user bandwidth going together might be indicative of last mile problems; player errors and time-synced origin errors could be indicative of content availability/encoding issues.
- Linked Exploration and Analysis
- For example, if users are buffering longer, figure out which edge servers they are connected to. If the increased rebuffering cannot be attributed to the edge servers, link forward to connected cache layer hierarchy, all the way to origin until a corelating signal emerges.
- Change Vector Identification
- Compare metric/dimensional spread between two time-windows to confirm if something changed, and if so, what?
- For example, did a different server/stream come into play at the time buffering was observed on last mile?
The data and tools the BOCC has access to are wired to be time dependent. The BOCC has multiple contexts of use -
- Real-time proactive monitoring of stream uptime and quality by way of alerts and eyes - over-glass charts
- Real-time diagnosis and troubleshooting to root casue and fix noticed issues
- Offline data mining for periodic and incremental improvements over status-quo
Given this, the data system and tools support:
- Segmented data gathering for critical contexts - System/application specific like CPU/memory/bandwidth and transactional (like HTTP/REST interfaces)
- Multiple data acquisition modes - Periodic aggregates, time specific snapshots, tables and events
- Specialized systems of data
- Alerting Engine - to continuously check for erroneous signals and call attention
- Diagnostics Engine - to enable troubleshooting and associated tools
- Mining Engine - to enable ad-hoc data exploration and inferencing
- Relevance drive time-sensitivity
- Latencies in seconds to minutes for alerting and diagnostics
- Latencies in minutes to hours for offline exploration
In summary, the BOCC is supported and enabled by connected, specialized systems of data that are called to attention when needed and help act on concerns. The systems are built on a stack of Akamai custom big-data software, coupled with open source software like Kafka, Casandra, Presto, HDFS/Hadoop and Spark to drive data analytics. A detailed look at the systems themselves is for a future blog. For now, rest assured, one can trust streams to be safe under the watch of the BOCC.