Abstract#
Increase download speed and reduce download stalls in BitTorrent downloads by sending additional data to nodes during the download process using HTTP or FTP servers.
Principle#
Many websites that provide BitTorrent download links also provide HTTP or FTP URLs for the same files. These URLs point to the exact same file. BitTorrent clients that utilize WebSeeding technology can download files from either source (BitTorrent or HTTP/FTP) and assemble the downloaded data blocks into a complete file. The HTTP or FTP server acts as an always-available "seed" that can continuously provide data blocks of the file to accelerate the download.
Advantages#
Every BitTorrent download has a source with open download ports, known as seeds, from which anyone can start downloading.
This new method does not break or modify any content for clients that do not recognize the addition of metadata.
BitTorrent clients that do not support HTTP/FTP seeds can still share data blocks through other peers that support HTTP/FTP.
No modification of metadata is required. Download management tools can typically add multiple HTTP/FTP mirror URLs for a file, and users only need to click on multiple links on a webpage and identify the same file name.
HTTP/FTP servers and their protocols are well-known to everyone.
This new method has been implemented in multiple BitTorrent clients, such as IMFile, GetRight, Mainline, uTorrent, Azureus, and libTorrent.
Only minimal changes are required in the BitTorrent client, and no changes are needed in the Tracker, HTTP, or FTP servers. Scripts are also not required on the HTTP or FTP servers.
Since many popular clients already support this feature (especially the IMFile client), seeding for BitTorrent downloads can be done using existing HTTP/FTP servers owned by companies or individuals.
Metadata Extension#
In the main area of the BitTorrent metadata file, rather than as part of the "info" section, there will be a new key called "url-list". This key will reference one or more URL address lists and contain the URLs that can be used to retrieve the torrent data. If a client cannot use this key, it can safely ignore it.
Example:
d 8:announce27:http:tracker.com/announce
8:url-list26:http:mirror.com/file.exe
4:info…
If the "url-list" URL ends with a slash "/", the client must add the torrent file name to generate the complete URL. This allows the .torrent generator to treat this field equally for both single-file and multi-file torrents.
Multi-File Torrents#
When downloading multi-file torrents, the BitTorrent client typically uses the "name" field in the "info" section of the torrent file as the root directory and uses the "path/file" field to specify the file path and name within that root directory. However, in this example, the "url-list" field of the torrent file is used as the root directory, so the client needs to add both the "name" and "path/file" to that root directory to create the requested URL.
For example,
... 8:url-list22:http://mirror.com/pub/ 4:infod5:filesld6:lengthi949e4:pathl10:Readme.txte e4:name7:michael
In this example, the client needs to set the "name" to "michael" and add the "Readme.txt" file from the "path/file" to the root directory http://mirror.com/pub/ to generate the URL for the download request. The client will concatenate the generated root directory, file path, and file name to create a complete download request URL: http://mirror.com/pub/michael/Readme.txt.
Client Implementation Overview#
First, the client should ignore protocols it does not recognize, as attempting to connect to such protocols may result in errors. For example, a client may only support HTTP and not FTP, or vice versa.
Second, the HTTP and FTP protocols are streaming protocols without the concept of data blocks like BitTorrent. This means that when downloading with HTTP or FTP, there will be many gaps in the data as it cannot be downloaded in blocks like in BitTorrent.
To address this issue, modifications have been made to the "rarest first" block selection method in the implementation to have more gaps in the downloaded file, allowing HTTP and FTP threads to start downloading from these gaps until they fill the entire gap. Byte range requests in HTTP can be used to request individual blocks, but this may be logged by servers and mistaken for a DoS attack against the server.
Client Implementation Considerations#
To better fill the gaps in the file with HTTP and FTP connections, some changes have been made to the standard BitTorrent algorithm.
Definition of Gaps#
Gaps refer to spaces consisting of multiple adjacent blocks that the client does not have. In this example, let's assume there is a bit field "YYnnnnYnnY", where Y represents blocks the client has and n represents blocks the client does not have. In this bit field, there are two gaps, one with a length of 4 and another with a length of 2, consisting of adjacent undownloaded blocks.
Block Selection for Downloading#
The main change is in the selection of blocks for downloading. If the rarity of all missing blocks is similar, it is best to choose a block from a gap with a length of 2 when requesting a block. Within any gap, it is best to start filling from the end (i.e., prioritize selecting the highest numbered block). Therefore, in the given bit field "YYnnnnYnnY", if the rarity of missing blocks is similar, it is best to choose a block from "# pieces YYnnnnY##Y", with the optimal choice being the block "$ YYnnnnYn$Y".
These block selection strategies maximize the use of continuous data downloads from HTTP/FTP connections. This way, HTTP/FTP connections can start downloading from the beginning of a gap and download as much data as possible before connecting to blocks the client already has.
Changing Block Selection Strategy#
Change the block selection strategy from the original "select the rarest first" to "prioritize selecting the farthest and relatively rarest block from completed blocks".
Rarest and Largest Gap Block#
First, find the block with the least frequency among all undownloaded blocks, referred to as PRWBG (Pretty Rare With Biggest Gap), assuming it is at a distance D from another completed block.
For other undownloaded blocks, if they are closer to completed blocks than D, they are considered rarer than PRWBG and marked as "rare-X" (where X is a constant equal to the square root of the number of peers minus 1).
Conversely, if the distance between an undownloaded block and a completed block is greater than D, it is considered potentially easier to obtain than PRWBG and marked as "rare+X" for possible selection as the next downloading block.
If the rarest block currently only has 3 nodes, the usual algorithm would select a block that the other two nodes also have. However, with the modified algorithm, when a block is closer to completing the file than the gap of the current rarest block, only one node having that block is sufficient for selection.
If the gap is larger and the "rarity" of the block is the same as the usual "rarity-1", then that block will be selected (thus, if the gap is larger, a block with 2 or 3 nodes will be selected).
Therefore, in the given situation "YYnnn1Yn2Y", it is better to choose 2 unless 1 is rarer than 2.
Pseudocode logic:
X = sqrt(Peers) - 1;
Gap = 0;
CurGap = 0;
CurRarest = MaxPieces+1;
for (i=0; i<MaxPieces; i++) {
if (IDoNotHavePiece(i)) {
Gap++;
if (PeerHasPiece(i)) {
PieceRareness = NumberOfPeersWithThePiece();
if (PieceRareness<(CurRarest-X) ||
(PieceRareness<=(CurRarest+X) && Gap>CurGap)) {
CurRarest = PieceRareness;
CurGap = Gap;
NextPiece = i;
}
}
} else {
Gap = 0;
}
}
Filling Gaps#
When a file is more than 50% complete, a different method is used to randomly select downloading blocks. (At 50% or more, there should be plenty of blocks that other nodes want to download.)
Every few blocks, it will choose the block closest to completing the file, ignoring rarity. For example, in the bit field "YYnnnnYnnY", it will choose the block #"YYnnnnYn#Y". This helps fill small gaps.
The client can choose whether to perform this step, and if this feature is implemented, another file completion percentage can also be used.
Pseudocode logic:
Gap = 0;
Piece = -1;
CurGap = MaxPieces+1;
for (i=0; i<MaxPieces; i++) {
if (IDoNotHavePiece(i)) {
Gap++;
if (PeerHasPiece(i)) {
Piece = i;
}
} else {----
if (Gap<CurGap && Gap>0 && Piece!=-1) {
CurGap = Gap;
NextPiece = Piece;
}
Gap = 0;
Piece = -1;
}
}
HTTP and FTP Optimization#
No modifications are needed for the HTTP/FTP protocols or servers themselves.
If the client knows that the HTTP/FTP download is part of a BitTorrent download, it is best to start downloading from a random position in the file on the first connection. This way, it is more likely to obtain the first batch of HTTP data blocks to share with BitTorrent nodes.
If there is an ongoing BitTorrent download already when starting an HTTP/FTP connection, the HTTP/FTP should start from the beginning of the largest gap. For the bit field "YYnnnnYnnY", it should start from "#YYnnnnY##Y".
If a data block is successfully downloaded from an HTTP/FTP server but the SHA checksum does not match, the connection must be closed and the URL discarded.
If the client receives a "busy" response, there is no need to abandon the URL of the HTTP or FTP server.
Multi-File Torrents#
When downloading multi-file torrents using HTTP/FTP servers, additional selection algorithms are needed to optimize download efficiency.
For large files, the client can choose BitTorrent blocks to optimize download speed, allowing the entire file to be downloaded from the HTTP/FTP server.
For torrents with small files, multiple HTTP/FTP transfers may be needed to complete a block download. In this case, using BitTorrent for transfers may be more suitable. For example, if there are 100 1KB files and the block size is 32KB, it would require 100 HTTP/FTP transfers to complete the download, but only 4 BitTorrent block requests.
Therefore, giving higher priority to BitTorrent block selection for small files and higher priority to HTTP/FTP for large files can effectively improve download efficiency.
Alternative Client Implementation#
If a client only supports HTTP and not FTP, it can take advantage of the byte range request feature of HTTP but request multiple blocks at once.
Multiple blocks can be treated as a single set and a single byte range request can be sent to the HTTP server. This reduces the number of HTTP connections and can be beneficial for clients.
Blocks can be treated as 10, 50, or 100 blocks. If processed in this way, "Pieces Per Block" can be chosen as MaxPieces/20, so each request is approximately 5% of the file.
Not Recommended Client Implementation#
A client can simply use HTTP byte range requests to request individual pieces. However, some server administrators may not like this implementation as it would generate hundreds or thousands of requests in their logs for a single file. Some may even consider it a denial-of-service attack against their server.
Other Protocol Considerations#
Other protocols may be used for downloading data. Although HTTP/FTP is listed as the protocol for torrent sharing, clients can use any protocol that allows downloading data. For example, HTTPS, FTPS, or SFTP protocols can also be used, although not all clients may support them.
Protocols like RTSP and MMS may also be used. Even Usenet's NNTP protocol can be used for downloading.
Other protocols may have other limitations, such as not allowing downloads from arbitrary positions in a file.
Clients can choose to implement only HTTP or FTP protocols instead of both simultaneously.