Torrent Streaming in depth

This is an analysis of torrent streaming, and it includes a walkthrough of my own tutorial on that subject, Torrent Streaming service with WebTorrent and Express. Please read that article before proceeding, if you haven’t already.

Definition

Torrent Streaming is the act of downloading the pieces of a torrent in order from the beginning to the end, and performing HTTP pseudo-streaming of the file while it’s downloading (in case of properly formatted MP4 files which have had their MOOV atom moved to the front, or with FLV files with optimized metadata).

HTTP Pseudo-streaming

HTTP Pseudo-streaming is a popular way of video distribution on the Web. It consists of running an MP4 (with H264+AAC video and audio inside of it) through an optimization program that will move the MP4 MOOV atom to the beginning of the file, which will enable the player, usually a browser, to play the video before it’s finished downloading.

Traditionally, when encoding a video, you write the MOOV atom last because it contains the necessary information for video playback, such as keyframe informations. Since you don’t know the keyframe positions before you encode the video, you need to write the MOOV atom last. This is normally not an issue with traditional playback from a storage medium, such as a hard disk, because the player can just seek to the end of the file to find the MOOV atom.

However, with web streaming, usually on older servers, you are unable to seek the file at all. This is also true if you’re passing the file through a CGI script like PHP, that always starts at the beginning of the file and progresses towards the end. Because of this, people have started moving the MOOV atom to the front of the file to combat this issue. That way, the browser will see the MOOV atom first and know the necessary information for video playback of incoming data.

With advances in web server technology, Apache and NGINX, the two most popular cross-platform web servers for non-embedded applications, have started supporting seeking in files with the so-called HTTP Range header. Therefor, the Browser will see that the file’s MOOV atom is at the end of the file and seek to the end to find it, then continue normal download and playback. However, it is still a good to move the MOOV atom to the front of the file when encoding video, as that will require less HTTP requests needed to play the file and the video might start faster.

Moving the MOOV atom to the front of the file

The topic of this article is not video itself, so I will not dwell on this for too much. The easiest way to move the MOOV atom to the front of the video file with FFmpeg is including the argument “-movflags +faststart” (with the plus sign), in the command.

To remux an existing video to contain a MOOV atom at the front, you need to run this command (beware, it requires a video to already be encoded as H.264 and audio as AAC, or else it will throw an error):

ffmpeg -i input.mp4 -codec copy -movflags +faststart output.mp4

This will run for a second or two, and create the file output.mp4 with the properly formatted MOOV atom.

Web distribution architecture

Traditionally, web publishers, like game developers, movie studios, newspapers, blogs, etc., have published content using the client-server architecture (shown in the picture below). That means, the content is hosted (located) on a central server or multiple servers, which all serve the clients. Meaning that you can only get the content from the publisher itself, not from other clients, called peers.

Client-Server network design, used in traditional web publishing

Some drawbacks of the client-server architecture are: high cost of distribution for popular content (since everyone downloads from the same source, you need either a very powerful server with lots of bandwidth, or lots of servers, commonly called a CDN - content delivery network), single point of failure (if your network goes down, nobody can download your content), wasted bandwidth (since clients cannot upload, you are wasting bandwidth by not utilising that upload potential), and many more.

An alternative, called P2P (peer to peer) networking (picture shown below), is what Torrent is based on. In a P2P network, everyone is equal and everyone downloads and uploads. Therefor, there is no single point of failure and no bandwidth is wasted, since everyone who downloads will also upload at least the same amount.

Peer-to-peer network design, used with torrents

P2P distribution has many advantages over the client-server distribution model, such as: lower cost of distribution, no single point of failure, etc. However, some of the issues with P2P networking are the need to have specialized software installed on all the users’ computers in order for them to upload the content, the inability of P2P networking to work in NAT setups (common for DSL and Cable users, as well as 4G mobile broadband), difficulty in obtaining old and unpopular content, as well as digital rot (content that is not often downloaded fades into obscurity and disappears).

Despite these drawbacks, Torrents have gained huge popularity over the past 15 years, mostly for distribution of illegal content (commonly called “piracy”). The reasons for that popularity are outlined above, and major improvements to the technology with the addition of DHT (distributed hash table) have increased the resiliency of the torrent network.

How Torrents actually work

The torrent protocol, named “BitTorrent”, was created by a programmer Bram Cohen. The BitTorrent protocol is currently maintained by BitTorrent, Inc., and the latest update to the protocol was in 2013.

The BitTorrent protocol is defined in several documents called BEPs (short for BitTorrent Extension Protocol). You can read all the documents here, however some have criticized BitTorrent for not doing a better job at maintaining the documents and updating them for accuracy.

If you’re going to be implementing a BitTorrent client, and aren’t going to use an existing library, like libtorrent-rasterbar, I recommend you read this unofficial specification, rather than the official one, which is harder to understand.

The only central piece in the Torrent architecture is the Tracker. A tracker is a server that coordinates the network by keeping a list of peers that are currently downloading or uploading a torrent, the pieces they have, the pieces they want and the pieces they offer.

A tracker is defined in the .torrent file, or in a magnet link, and works on HTTP, WebSocket (with WebTorrent) and UDP protocols.

Each torrent is split into multiple chunks, called pieces, which vary in size and can be from 16KB to a dozen megabytes. All pieces also have a hash that enables a client to check whether the piece got corrupted or whether someone intentionally provided an invalid piece.

A torrent client, when provided with a .torrent file, will first ask the trackers specified for an initial peer list. It will then start downloading pieces, rarest first (this will become important later in the article). After it has got some pieces, it will let the tracker know it’s ready to upload and other peers will start downloading from it. A client can regulate it’s own upload speed, as well as the number of peers it allows to connect to it, however it is usually set to 80% of the total upload speed, and the number of peers depends on the capabilities of the client and the computer itself.

After a torrent is done downloading, a client may choose to continue uploading, which is called seeding, to the network. It’s usual to upload around 2:1, meaning double the amount of what you downloaded, however many people stop the uploading manually after some time.

There also exist a way to download without uploading, thought it is considered to be wrong ethically, and that is leeching, either by slowing down your upload speed to 0 or near zero, or using specialized clients which trick the tracker into thinking they’re uploading when they’re not.

The .torrent file is encoded using a format called Bencode, which was developed for the Torrent protocol. You can see an example of the format in the BitTorrent specification.

Web seeding

Torrents also support the ability to download from a HTTP server, called “web seeding”. This is not often used, however it’s useful if a torrent is very rare and doesn’t have many seeds. The web seed is defined in the torrent file in a similar way to the trackers.

Is torrent streaming legal?

The legality of the act has nothing to do with the technology itself. Streaming and downloading torrents whose content is in the public domain, or was licensed under a free license, is legal.

Streaming and downloading illegal content, like TV Shows and Movies you haven’t paid for, might be illegal in your country. Please note that I’m not a lawyer and this is not legal advice. I will not be held accountable for any possible issues that arise from your use and/or abuse of the technologies described in this article.

In the European Union, it is legal to “create temporary or cached copies of works (copyrighted or otherwise) online”, which means that websites that stream torrents are legal in the EU. Simply put, caching or transmitting copyrighted works is legal, and if the website doesn’t upload the content to other peers on the BitTorrent network (by leeching), they cannot be held accountable.

It is considered that downloading content for personal consumption in EU is legal, as long as you are not distributing the content. However, that is not usually possible with the Torrent protocol, so you will be held accountable for distribution of copyrighted content in the EU.

This has led to an explosion of video streaming websites which use cyber locker services, or even abuse Google’s caching and CDN services to serve video content over HTTP without the need to own expensive servers.

In the US, it is illegal to download or stream copyrighted content you do not have a license to view, no matter whether you upload or not.

Is torrent streaming ethical?

You might wonder why would it be unethical to stream torrents, assuming you’re okay morally with copyright infringement? To put it simply, the torrent network was not made for such a thing, and streaming endangers the existence of the torrent network.

When you download a torrent by requesting rarest pieces first, you’re effectively multiplying their availability by seeding them as soon as you’ve downloaded them. And, by extension, when you continue seeding 2:1 after downloading the file, you’re “giving back” to the community more than what you took, and that behavior is key to the prevalence of torrents.

When you stream a torrent, you’re requesting the pieces in order, which is bad for the health of a torrent, and you stop seeding as soon as the torrent is downloaded, or in some cases as soon as it’s stopped streaming. In both of these cases, you’re not giving back anything, and this is considered very harmful behaviour.

The rise of torrent streaming has the potential to endanger and even destroy the torrent network, and therefor if you’re morally inclined to do so, it is better to download and then seed the torrents for some time rather than streaming them.

An analysis of the tutorial

I have neglected to provide a full analysis and in-depth review of my own tutorial on torrent streaming with Node.js and WebTorrent. Due to it being my most popular blog post, I am providing a review and analysis of the tutorial here.

Why WebTorrent

I choose WebTorrent by feross as opposed to Peerflix-Server by mafintosh, or Go-Peerflix, or a custom solution, purely because I wanted to maintain a level of portability of the code between the browser and the Node client. I also wanted to later enable the Node client to communicate with WebSocket peers and contribute to the development and the capacity of the WebTorrent network.

Please note, by default WebTorrent doesn’t allow you to connect to WebSocket peers on the Node platform, and doesn’t allow you to connect to traditional BitTorrent peers on the Web platform. The first one can be remedied by using a special build of WebTorrent which can communicate with both types of peers, but in browsers you cannot connect to BitTorrent peers.

Why Node.js

When the tutorial was written, I was not aware of Go-Peerflix. And, despite my best efforts, I’ve not had the time to learn Go yet, so I choose Node.js as I already have experience with that platform and have used WebTorrent on Node already with a good track record.

Why Express

I choose Express as the server framework that supports the project, simply because I’m most experienced with it, and it’s easy for beginners to understand the code and modify it. However, because StrongLoop has acquired Express, and might stop further development, I would not choose Express as a future-proof library, and instead would use something like Hapi.

Open CORS policy? WHAAAAA?!

I choose to open the CORS (Cross-Origin Resource Sharing) policy, simply for the simplicity it enables when developing applications. If you’re using the code in production you should of course modify the headers to be restricted to your own domain only.

Why such a simple Magnet URI policy?

Again, I choose to just concat the InfoHash to a magnet URI with prebuilt trackers, simply for simplicity of such an approach. It’s destructive and not the best approach by any means, but it worked for the tutorial. If you were using this in a production environment you should of course write a better system for magnet URI handling.

Why isn’t it possible to stream MKV files?

By design, this application doesn’t attempt to transcode the files you’re streaming, but instead just parses the Range header and serves the appropriate range of bytes to the browser. It assumes you’re using a properly formatted MP4 file, for simplicity, and because transcoding would introduce high CPU usage requirements and would make the application very hard to scale.

I will publish a new version of the tutorial soon, which will transcode the files to allow you to stream any format to the browser, albeit with Adobe Flash as a requirement.

Why do you need to manually delete the torrent?

For simplicity, I did not try to detect when the buffer was closed by the browser and delete the torrent automatically, and also to prevent un-necessary delete requests when the client looses connectivity or the video was paused or seeking. If you wanted to do that, it’s better to implement it in JavaScript in the Browser rather than on the server.