The World Wide Web was founded on the idea of peer to peer (P2P) networking with direct communication between machines and users, but for various reasons the internet evolved more into a broadcast or client/server model. P2P revived around 1999 with Napster and then the BitTorrent protocol released in 2001, but this was a mixed blessing given its association with piracy and especially illicit music streaming.
By 2009, the volume of P2P traffic was in steep decline again at the time online video was arising as a centrally driven unicast service. But most recently the proliferation of online video has given a new lease of life once more to P2P distribution in the guise of P2P CDNs, which promise to solve the scaling problem of traditional CDNs that struggle to cope with peak time traffic.
This problem has got worse with the more recent explosion in live streaming, which generates sharper peaks in demand while also raising the latency challenge, which P2P distribution can help alleviate if properly configured and managed. As a result, we are now looking at some quite bullish forecasts for growth in P2P CDN traffic driven by live streaming and online gaming in particular with their requirement for low latency, but also popular on demand content to ease pressure on core CDN capacity. P2P CDNs are also promising for distribution of large files because these could be cached locally within access networks as well as being accessible on customer networks.
But we should not believe all the hype of the P2P CDN vendors because there are some hurdles still to be overcome and these networks are not optimal for all service or traffic types. It is true though that P2P CDN distribution will grow substantially and as a result the traffic will not all go to new dedicated P2P CDN vendors such as Peer5, Streamroot, Teleport and Strive Technologies, but also traditional players. Akamai, Limelight and others are also incorporating P2P distribution within their CDNs to cut latency and costs associated with live streaming traffic in particular.
P2P CDN vendors refer to their services as serverless, which is confusing because it means something different from the concept of serverless computing or networking that arose in the context of virtualization some years ago. At the time, the term referred to the cutting of the direct link between application software and the hardware executing it. Virtualization itself, as pioneered by IBM over 40 years ago, created a layer between the application and the underlying operating system so that functions requiring hardware such as storage and processing could be spread across multiple actual computers, bringing flexibility to use off the shelf components and also scalability.
But there was still the concept of a virtual machine running a given application and “serverless architecture” then went further by moving the virtualization into containers hidden from the customer and providing the whole platform as a service including running of applications. All the enterprise customer now had to focus on was the application code itself and even that could be offloaded to a third-party developer or systems integrator. But there were still computers, or servers if you like, within the network, running the applications on behalf of customers, although with the details entirely hidden.
In the case of P2P CDNs, however, serverless means something quite different – the avoidance of a server at all, although in practice there will still be some computers within the network. The point is that the HTTP servers caching content inside traditional CDNs are replaced by customers’ own servers or PCs attached to the network.
On the face of it, P2P distribution is tailor made for live streaming. In the absence of relief through multicasting, traditional CDNs can only scale up by adding additional server capacity to meet peak demand, but in practice, despite claims from providers, this is impossible to guarantee without inordinate overprovisioning against average traffic levels. Therefore, most major content distributors have either built their own CDNs or use multiple third-party networks to spread the load and provide some insulation against congestion at peak times.
Netflix is an interesting case here, having built its OpenConnect CDN around an infrastructure that is serverless in the old sense of being divorced from the hardware infrastructure, which is mostly from Amazon Web Services. Netflix therefore caters in effect for peak demand by relying on an even bigger network which is closely integrated with the ISPs that deliver its content for optimal efficiency, while being less in need of P2P because of its lack of live content.
Meanwhile, CDN vendors are being seduced by P2P with its in-built self-scaling since capacity, in theory at least, increases automatically with the number of users, each of which is effectively a node on the network. Each user in principle becomes a server on the network for a given piece of content as soon as it has been downloaded.
This did not really work in the case of traditional P2P operations such as Bit Torrent because these were essentially anarchical in their operation. Such protocols encourage what are sometimes called selfish swarms because users join to obtain the item they want and then immediately leave, so that the content is not available to others.
This deficiency gave rise to other concepts such as centrally managed P2P CDNs to provide some degree of traditional CDN reliability and then Peer-Assisted (PA) CDNs, which provide a more perfect hybrid by having CDN servers operating as back up.
For PA CDNs, the default mode is for content to be delivered in chunks P2P “within the user swarm” whenever there is sufficient capacity or availability, but failing that delivery falls back to traditional CDN servers. The latter would happen typically when there are no peers near a user with free upload capacity to deliver the content while maintaining QoS guarantees.
While traditional CDN servers are still required, they do not have to be provisioned for peak demand because they exploit that self-scaling property. The servers are only needed when there are insufficient users to serve the content, which by definition only happens when demand is low. In practice, as demand scales up, there would still be some need for the CDN servers in order to maintain QoS levels, but the peaks are smoothed out.
Traditional CDN vendors such as Akamai and several in China, such as ChinaCache and Xunlei, have developed PA CDN capabilities. At the same time PA CDNs have been evaluated by several broadcasters for online services, including MSN Video and the BBC for its iPlayer. Such studies have reported significant traffic savings over 50% and in some cases approaching 90%. But they have also unearthed problems which have until recently constrained commercial deployment of P2P or PA CDNs. Apart from instability and lack of content availability, these include playback latency resulting from poor control over user caches, as well as lack of incentives for users’ participation. Another factor can be the impact of firewalls making peers inaccessible to others.
Through the efforts made so far, most of the technical challenges have been overcome. These include the playback latency, remedied through various mechanisms including better management of end user caches, as well as predictive content caching algorithms operating on the basis of popularity or preferences. Then the user incentives can be provided on the basis of cost and also transparency, rather as public WiFi services can be offered to users in return for sharing their wireless routers without having to be aware of that.
Another key factor has been emergence of the WebRTC protocol which has been widely adopted by P2P CDNs such as Peer5 because it enables real-time communications directly between web browsers. This avoids the need for a dedicated plug-in and again reduces the friction against user participation. It comes standard in most browsers including Chrome, Firefox and Opera, while being supported by both the Android and iOS operating systems. This makes it ideal for setting up P2P connections between device browsers.
One important point is that P2P CDNs will never replace conventional CDN operation because they are inherently fragile and peter out when user participation is low. They will therefore always need back-up from a conventional CDN, which is why the PA CDN model is often preferred. The model thrives on volume, with its strongest attribute being smoothing out peaks, and also on locality because then content can often be distributed just once from the center and then shuffled around among closely located peers.
A conventional CDN then works best for fast long-distance content distribution, but then combines well with a P2P CDN when volumes spike, as during large live streaming events with huge audiences, typically sports or major concerts. We can therefore expect P2P CDNs to enjoy sustained but not explosive growth over the next few years.