OTT operators shine torch for DevOps and microservices

It is not just price and convenience that has propelled OTT and provoked churn from legacy pay TV services in many more mature markets, but also analytics and rate of service evolution. Netflix in particular is often held out as the beacon other operators should follow for innovation, although it is now so big that its investments in areas such as advanced audio for example are beyond the means of all but the largest players.

Netflix has however set some worthwhile precedents over software development, in particular DevOps and the associated microservices, which first appeared on the radar for pay TV around six years ago. It is only much more recently that many operators have got around to incorporating DevOps in their development methodology, largely because it requires a change of culture as well as coding technology. Indeed, generally the DevOps movement is still gathering force, with 17% of enterprises fully embracing the methodology in 2018 compared with 10% in 2017, according to web site Statista.

As video infrastructures have converged with enterprise IT, they have incorporated technologies and methods that evolved first for enterprise data centers. First came virtualization aiming to sever the link between application software, operating system and hardware so that commodity COTS (Common Off The Shelf) components could be used, reducing costs.

Then came the move towards cloud computing, either eliminating hardware altogether or at least reducing it further, while taking advantage of scale economies in skills and software as well.

These innovations though did little on their own to address a major handicap that has long afflicting enterprise IT in general, its dependence on monolithic projects that impeded change and innovation while imposing high risk of failure.

The litany of expensive and disastrous IT projects runs well beyond public sectors and governments even if some of those attracted greatest publicity because of their more direct impact on the public through high costs and poor services.

Meanwhile broadcasting and pay TV were becoming increasingly governed by the same IT infrastructures as other enterprises and found themselves hidebound by the same constraints. This did not matter so much at first because broadcasters and operators were accustomed to monolithic projects in their proprietary domains and innovations emerged over periods of years rather than months or weeks. But the clock was ticking as the digital revolution took place with emergence of OTT players having nimbler platforms able to evolve services or at least improvements and new features much faster.

At the same time the seeds of DevOps and microservices were being sown, the main idea being to end the disjunction between development of software and its subsequent operational management in the field. The aim here was to shorten the cycle of software updates and enhancements which had previously required repeated iterations between development and operational teams that often did not communicate much.

Although this is about cultural change within software development DevOps would not have achieved much without a radical overhaul of the surrounding infrastructure or new tools to support microservices comprising small components. This is why containers were developed to insulate application software from underlying operating systems, even if these had been virtualized to remove the dependence on the hardware. Containers managed by orchestration systems such as Kubernetes, originally developed by Google and now administered by the Cloud Native Computing Foundation, decouple applications components however small from the infrastructure by providing the basic services they need. This facilitates portability between environments and more ready deployment in the cloud.

The containers themselves can be enhanced more readily since that can be done without changing the underlying operational fabrics, providing the standard interfaces with those are adhered to, with ability to rollback to previous versions if needed. This provides the foundation for more loosely coupled and distributed micro-services, where applications are broken into smaller, independent pieces that can be deployed and managed dynamically, with faster deployment and feedback from the field. It gets away from the monolithic stacks running on big single-purpose machines that used to dominate enterprise IT.

It is important to realize though that no development in IT is ever a panacea and always introduces new problems just as old ones are resolved. This holds just as much for DevOps and microservices, where the major challenges revolve around tracking and visibility, which can underdo one of the supposed advantages of escaping from monolithic software projects, which was uncertainty over budget and timing.

DevOps projects themselves can become unpredictable if enterprises find it hard to keep track of where the microservices and associated components are located and how they inter-relate. For this reason tools have been developed within Kubernetes and other orchestration systems to assist with visibility and tracking, but these will only help if they are applied properly, which boils down to human issues that can span multiple development and operational teams.

Corresponding deployments in the cloud can exacerbate these problems. At least with a traditional legacy data center, even if distributed over a network, it was relatively straightforward to keep track of how many servers were running and the associated applications. However one of the cloud’s benefits is that capacity can be scaled on demand and inevitably this brings elements of chaos and complexity only amplified by use of  microservices, making it difficult to tell where components are and which are running in real-time. Lack of such visibility can also decrease reliability because it is harder to test effectively for all failure conditions.

It was no coincidence that Netflix christened its self-service engineering system for microservices development under DevOps the Chaos Automation Platform. This was developed to test potential problems in production environments and determine exactly how the software would behave in the event of failure.

This was deployed after Netflix had completed a massive twin seven-year twin migration to Amazon Web Services (AWS) for cloud computing and its own OpenConnect CDN, running up to 2016. While applications such as subscriber management, analytics and recommendation run in the AWS cloud, OpenConnect stores the video content and delivers it to client devices. Netflix originally outsourced streaming video delivery to third-party CDNs but found these vendors struggled to meet its SLAs as its traffic exploded. It now installs OpenConnect appliances as close as possible to the points of consumption, preferably inside local Internet Service Provider (ISP) data centers, since that insulates the Netflix service from the wider internet.

Through use of algorithms that gauge popularity and associated storage techniques, content is offloaded for optimal efficiency, reducing demand on upstream network capacity. But this did not by itself address auto recovery, which Netflix deemed critical for a streaming service to avoid customers having to contact support if possible. The original objective was to ensure all customers can be moved to another region within six minutes in the event of the one they were originally connected to failing completely and that target has since been shortened. Given that AWS itself has failure recovery within its regions Netflix is taking out double insurance here.

Notably Netflix has insisted that technical measures on their own would not have ensured rapid recovery without the backing of a corporate culture that embraces such ideas. This comes back to DevOps and microservices, which imply devolution of responsibility to multiple teams to stimulate innovation. These teams must balance their spirit of adventure with a need to ensure their software meets stringent requirements for robustness and safety, adhering to fundamental rules.

For an organization the size Netflix is now, that is quite a challenge and it may be easier for smaller operators to square the circle between freedom and responsibility in microservices development. But Netflix remains the pacemaker taking DevOps forward in video service provision.