返回列表 发帖

BitTorrent 协议规范(BT协议集合)

翻译:小马哥
日期:2004-5-22
BitTorrent 是一种分发文件的协议。它通过URL来识别内容,并且可以无缝的和web进行交互。它基于HTTP协议,它的优势是:如果有多个下载者并发的下载同一个文件,那么,每个下载者也同时为其它下载者上传文件,这样,文件源可以支持大量的用户进行下载,而只带来适当的负载的增长。(译注:因为大量的负载被均衡到整个系统中,所以提供源文件的机器的负载只有少量增长)

一个BT文件分布系统由下列实体组成:
一个普通的web服务器
一个静态的“元信息”文件
一个跟踪(tracker)服务器
终端用户的web浏览器
终端下载者

理想的情况是多个终端用户在下载同一个文件。
要提供文件共享,那么一台主机需要执行以下步骤:
Ø运行一个 tracker服务器(或者,已经有一个tracker服务器在运行了也可以)
Ø运行一个web服务器,例如apache,或者已经有一个web服务器在运行了。
Ø在web服务器上,将文件扩展名.torrent 和[wiki]MIME[/wiki]类型 application/x-bittorrent关联起来(或者已经关联了)
Ø根据 tracker服务器的 URL 和要共享的文件来创建一个“元信息”文件(.torrent)。
Ø将“元信息”文件发布到web服务器上
Ø在某个web页面上,添加一个到“元信息”文件的链接。
Ø运行一个已经拥有完整文件的下载者(被成为’origin’,或者’seed’,种子)

要开始下载文件,那么终端用户执行以下步骤:
Ø安装 BT(或者已经安装)
Ø访问提供 .torrent 文件的web服务器
Ø点击到 .torrent 文件的链接(译注:这时候,bt会弹出一个对话框)
Ø选择要把下载的文件保存到哪里?或者是一次断点续传
Ø等待下载的完成。
Ø结束bt程序的运行(如果不主动结束,那么bt会一直为其它人提供文件上传)

各个部分之间的连通性如下:
网站负责提供一个静态的文件,而把BT辅助程序(客户端)放在客户端机器上。
Trackers从所有下载者处接收信息,并返回给它们一个随机的peers的列表。这种交互是通过HTTP或HTTPS协议来完成的。
下载者周期性的向tracker登记,使得tracker能了解它们的进度;下载者之间通过直接连接进行数据的上传和下载。这种连接使用的是 BitTorrent 对等协议,它基于TCP。
Origin只负责上传,从不下载,因为它已经拥有了完整的文件。Origin是必须的。

元文件和tracker的响应都采用的是一种简单、有效、可扩展的格式,被称为bencoding,它可以包含字符串和整数。由于对不需要的字典关键字可以忽略,所以这种格式具有可扩展性,其它选项以后可以方便的加进来。

Bencoding格式如下:
对于字符串,首先是一个字符串的长度,然后是冒号,后面跟着实际的字符串,例如:4:spam,就是“ spam”
整数编码如下,以 ‘i’ 开始,然后10进制的整数值,最后以’e’结尾。例如,i3e表示3,I-3e表示-3。整数没有大小限制。I-0e是无效的。除了 i0e外,所以以0起始的整数都无效。I0e当然表示0。
列表编码如下,以’l’开始,接下来是列表值的编码(也采用bencoded编码),最后以’e’结束。例如:l4:spam4:eggse 表示 [‘spam’, ‘eggs’]。
字典编码如下,以’d’开始,接下来是可选的keys和它对应的值,最户以’e’结束。例如:d3:cow3:moo4:spam4:eggse,表示{‘cow’:’moo’,’spam’:’eggs’},而d4:spaml1:al:bee 表示 {‘spam’:[‘a’,’b’]}。键值必须是字符串,而且已经排序(并非是按照字母顺序排序,而是根据原始的字符串进行排序)。

元文件是采用bencoded编码的字典,包括以下关键字:

announce tracker的服务器

info 它实际上是一个字典,包括以下关键字:

Name:
一个字符串,在保存文件的时候,作为一个建议值。仅仅是个建议而已,你可以用别的名字保存文件。
Piece length:
为了更好的传输,文件被分隔成等长的片断,除了最后一个片断以外,这个值就是片断的大小。片断大小几乎一直都是2的幂,最常用的是 256k(BT的前一个版本3.2,用的是1M作为默认大小)
Pieces:
一个长度为20的整数倍的字符串。它将再被分隔为20字节长的字符串,每个子串都是相应片断的hash值。

此外,还有一个length或files的关键字,这两个关键字只能出现一个。如果是length,那么表示要下载的仅仅是单个文件,如果是files那么要下载的是一个目录中的多个文件。
如果是单个文件,那么length是该文件的长度。

为了能支持其它关键字,对于多个文件的情况,也把它当作一个文件来看,也就是按照文件出现的顺序,把每个文件的信息连接起来,形成一个字符串。每个文件的信息实际上也是一个字典,包括以下关键字:
Length:文件长度
Path:子目录名称的列表,列表最后一项是文件的实际名称。(不允许出现列表为空的情况)。
Name:在单文件情况下,name是文件的名称,而在多文件情况下,name是目录的名称。

Tracker查询。Trakcer通过HTTP的GET命令的参数来接收信息,而响应给对方(也就是下载者)的是经过bencoded编码的消息。注意,尽管当前的tracker的实现需要一个web服务器,它实际上可以运行的更轻便一些,例如,作为apache的一个模块。
Tracker GET requests have the following keys:

发送给Tracker的GET请求,包含以下关键字:

Info_hash:
元文件中info部分的sha hash,20字节长。这个字符创几乎肯定需要被转义(译注:在URL中,有些字符不能出现,必须通过unicode进行编码)

Peer_id:
下载者的id,一个20字节长的字符串。每个下载者在开始一次新的下载之前,需要随机创建这个id。这个字符串通常也需要被转义。

Ip:
一个可选的参数,给出了peer的ip地址(或者dns名称?)。通常用在origin身上,如果它和tracker在同一个机器上。

Port:
peer所监听的端口。下载者通常在在 6881 端口上监听,如果该端口被占用,那么会一直尝试到 6889,如果都被占用,那么就放弃监听。

Uploaded:
已经上载的数据大小,十进制表示。

Downloaded:
已经下载的数据大小,十进制表示

Left:
该peer还有多少数据没有下载完,十进制表示。注意,这个值不能根据文件长度和已下载数据大小计算出来,因为很可能是断点续传,如果因为检查文件完整性失败而必须重新下载的时候,这也提供了一个机会。

Event:
一个可选的关键字,值是started、compted或者stopped之一(也可以为空,不做处理)。如果不出现该关键字,。在一次下载刚开始的时候,该值被设置为started,在下载完成之后,设置为completed。如果下载者停止了下载,那么该值设置为stopped。

Tracker的响应是用bencoded编码的字典。如果tracker的响应中有一个关键字failure reason,那么它对应的是一个字符串,用来解释查询失败的原因,其它关键字都不再需要了。否则,它必须有两个关键字:Interval:下载者在两次发送请求之间的时间间隔。Peers:一个字典的列表,每个字典包括以下关键字:Peer id,Ip,Port,分别对应peer所选择的id、ip地址或者dns名称、端口号。注意,如果某些事件发生,或者需要更多的peers,那么下载者可能不定期的发送请求,

(downloader 通过 HTTP 的GET 命令来向 tracker 发送查询请求,tracker 响应一个peers 的列表)

如果你想对元信息文件或者tracker查询进行扩展,那么需要同Bram Cohen协调,以确保所有的扩展都是兼容的。

BT对等协议基于TCP,它很有效率,并不需要设置任何socket选项。(译注:BT对等协议指的是peer与peer之间交换信息的协议)
对等的两个连接是对称的,消息在两个方向上同样的传递,数据也可以在任何一个方向上流动。
一旦某个peer下载完了一个片断,并且也检查了它的完整性,那么它就向它所有的peers宣布它拥有了这个片断。
连接的任何一端都包含两比特的状态信息:是否choked,是否感兴趣。Choking是通知对方,没有数据可以发送,除非unchoking发生。Choking的原因以及技术后文解释。

一旦一端状态变为interested,而另一端变为非choking,那么数据传输就开始了。(也就是说,一个peer,如果想从它的某个peer那里得到数据,那么,它首先必须将它两之间的连接设置为 interested,其实就是发一个消息过去,而另一个peer,要检查它是否应该给这个家伙发送数据,如果它对这个家伙是 unchoke,那么就可以给它发数据,否则还是不能给它数据)Interested状态必须一直被设置――任何时候。要用点技巧才能比较好的实现这个目的,但它使得下载者能够立刻知道哪些peers将开始下载。

对等协议由一个握手开始,后面是循环的消息流,每个消息的前面,都有一个数字来表示消息的长度。握手的过程首先是先发送19,然后发送“BitTorrent protocol”。19就是“BitTorrent protocol”的长度。
后续的所有的整数,都采用big-endian 来编码为4个字节
在协议名称之后,是8个保留的字节,这些字节当前都设置为0。
接下来对元文件中的 info 信息,通过 sha1 计算后得到的 hash值,20个字节长。接收消息方,也会对 info 进行一个 hash 运算,如果这两个结果不一样,那么说明对方要的文件,并不是自己所要提供的,所以切断连接。

接下来是20个字节的 peer id。
这就是握手过程

接下来就是以消息长度开始的消息流,这是可选的。长度为0 的消息,用于保持连接的活动状态,被忽略。通常每隔2分钟发送一个这样的消息。

其它类型的消息,都有一个字节长的消息类型,可能的值如下:

‘choke’, ‘unchoe’, ‘interested’, not interested’类型的消息不再含有其它数据了。

‘bitfield’永远也仅仅是第一个被发送的消息。它的数据实际是一个位图,如果downloader已经发送了某个片断,那么对应的位置1,否则置0。Downloaders如果一个片断也没有,可以忽略这个消息。(通过这个消息,能知道什么了?)

‘have’类型的消息,后面的数据是一个简单的数字,它是下载者刚刚下载完并检查过完整性的片断的索引。(由此,可以看到,peer通过这种消息,很快就相互了解了谁都有什么片断)

‘request’类型的消息,后面包含索引、开始位置和长度)长度是2的幂。当前的实现都用的是215 ,而关闭连接的时候,请求一个超过2 17的长度。(这种类型的消息,就是当一个peer希望另一个peer给它提供片断的时候,发出的请求)

‘cancel’类型的消息,它的数据和’request’消息一样。它们通常只在下载趋向完成的时候发送,也就是在‘结束模式“阶段发送。在一次下载接近完成的时候,最后的几个片断需要很长时间才能下载完。为了确保最后几个片断尽快下载完,它向所有的peers发送下载请求。为了保证这不带来可怕的低效,一旦某个片断下载完成,它就其它peers发送’cancel’消息。(意思就是说,我不要这个片断了,你要是准备好了,也不用给我发了,可以想象,如果对方还是把数据发送过来了,那么这边必须忽略这些重复的数据)。

‘piece’类型的消息,后面保护索引号、开始位置和实际的数据。注意,这种类型的消息和 ‘request’消息之间有潜在的联系(译注:因为通常有了request消息之后,才会响应‘piece’消息)。如果choke和unchoke消息发送的过于迅速,或者,传输速度变的很慢,那么可能会读到一些并不是所期望的片断。( 也就是说,有时候读到了一些片断,但这些片断并不是所想要的)
wish.lijt@gmail.comQQ:11242538 call me

这个翻译版本由孤波独立完成
原文见http://bitconjurer.org/BitTorrent/protocol.html
作者Bram Cohen
孤波享有对该翻译版本解释权修改权
非商业引用请注明译者

目前处在Beta测试中,很不完善,请不要转贴或引用
希望高人提出建议和指出错误错误不断完善
尤其请注意后面几段,我都不知道自己翻译的什么

我将在尽量3天内发出正式版本。


BitTorrent[wiki]协议[/wiki]详解

BitTrrent(简称BT,比特洪流)是一个文件分发协议,它通过URL识别内容并且和网络无缝结合。它在[wiki]HTTP[/wiki]平台上的优势在于,同时下在一个文件的下载者在下载的同时不断互相上传数据,使文件源可以在很有限的负载增加的情况下支持大量下载者同时下载。

一个BT式文件分发需要以下实体:

·一个普通网络服务器
·一个静态元信息文件
·一个BT Tracker
·一个“原始”下载者
·网络终端浏览者
·网络终端下载者

这里假设理想情况下一个文件有多个下载者。

架设一个BT服务器步骤如下:

1.开始运行Tracker(已运行的跳过这一步);
2.开始运行普通网络服务器端程序,如Apache,已运行的跳过这一步;
3.在网络服务器上将.torrent文件关联到Mimetype类型application/x-bittorrent(已关联的跳过这一步);
4.用要发布的完整文件和Tracker的URL创建一个元信息文件(.torrent文件);
5.将元信息文件放置在网络服务器上;
6.在网页上发布元信息文件(.torrent文件)链接;
7.原始下载者提供完整的文件(原本)。

通过BT下载步骤如下:

1.安装BT客户端程序(已安装的跳过这一步);
2.上网;
3.点击一个链到.torrent文件的链接;
4.选择本地存储路径,选定需要下载的文件(对有选择下载功能的BT客户端用户);
5.等待下载完成;
6.用户退出下载(之前下载者不停止上传)。

连接状况如下:

·网站正常提供静态文件连接,并且启动客户端上的BT程序;
·Tracker即时接收所有下载者信息,并且给每个下载者一份随机的peer列表。通过HTTP或HTTPS协议实现;
·下载者每隔一段时间连一次Tracher,告知自己的进度,并和那些已经直接连接上的peer进行数据的上传下载。这些连接遵循BitTorrent peer协议,通过[wiki]TCP[/wiki]协议进行通信。
·原始下载者只上传不下载,他拥有整个文件,所以很必要向网络中传输完文件的所有部分。在一些人气很旺的下载中,原始下载者经常可以在较短的时间内退出上传,由其它已经下载到整个文件的下载者继续提供上传。

元信息文件和Tracker的回应信息都以一种简单高效可扩展的格式(Bencoding,B编码)传送。B编码过的信息就是以包含字符串和整型数据的字典和列表的嵌套(像在Python中一样),可扩展性是指可以通过减少字典忽略的关键值来添加新的特性。

B编码规则如下:

·字符串表示为十进制数的既定字符串长度加冒号再跟原字符串。
如4:spam就相当于'spam'。
·整型数据表示成前面加'i'后面加'e'中间是十进制数,如i3e就相当于3,i-3e就是-3。整型数据没有长度限制。i-0e无效,所有以'i0'开头的除了代表0的i0e,其它都无效。
·列表编码为一个'l'开头后面跟它所包含的项目(已经编码过)最后加一个'e',比如l4:spam4:eggse就等于['spam', 'eggs']。
·字典编码为一个'd'开头后面跟一个交替关键值(key)及其对应值的列表最后加一个'e'。
如:d3:cow3:moo4:spam4:eggse相当于{'cow': 'moo', 'spam': 'eggs'}
d4:spaml1:a1:bee相当于{'spam': ['a', 'b']}
关键值必须是处理过的字符串(用原始字符串编码的,而且不是数字字母混合编码的)。

元信息文件就是B编码的有以下关键值的字典:

announce(声明)

Tracker的URL。

info(信息)

此关键值对应一个字典包含以下描述的关键值:

关键值name对应一个字符串,代表默认的下载文件或存成目录的名字。它是纯粹建议性的。

关键值piece length(块长)对应文件分割成的块的字节数。出于传输需要,文件被分割成大小相等的块,除了最后一块通常会小一些。块长一般来说是2的权值,大部分设块长为256K(2的18次幂)。

关键值pieces(块)对应一个字符串,此字符串长度是20的倍数。它可以再分成每20字节一段的多个字符串,分别对应块在索引中的SHA1校验码(hash)。

还有关键值length(长度)和files(文件),它们不能同时出现也不能都不出现。当length出现说明这个元信息文件只是单文件下载,否则说明是多文件的目录结构下载。

单文件情况下,length对应文件长度的字节数。

多文件情况被看作是把许多单文件按文件列表中的顺序连成一个大文件下载,而关键值files就对应文件列表,是一个字典的列表,其中每个字典又包含以下关键值:

length(长度)

文件长度的字节数。

path(路径)

一个包含字符串的列表,字符串就是子目录名,最后一项的字符串是文件名。
(一个长度为零的length表单是错误的。)

在单文件情况下,关键值name是文件名;多文件情况下,它就成了目录名。

Tracker质询是双向的。Tracker通过HTTP GET参数获得信息,然后返回一个B编码后的信息。尽管Tracker需要在服务器端执行,但它运行流畅像Apache的一个模块。

Tracker的GET请求有如下关键值:

info_hash

20字节长的SHA1验证码,来自B编码过的元信息文件中的info值下,是元信息文件的一个支链。这个值是自动转换的。

peer_id

一个20字节长的字符串,是每个用户开始下载时随机生成的ID。这个值也是是自动转换的。

ip

一个可选择的参数给出peer所在的[wiki]IP[/wiki](或[wiki]DNS[/wiki]主机名),一般是和Tracker同机器的原始下载者得到后以便散发文件。

port

监听端口,官方默认的是从6881端口开始试,如果端口被占用则依次向后推一个端口找空闲端口,到6889端口为止。

uploaded

目前总上传量,编码为十进制ASCII码。

downloaded

目前总下载量,编码为十进制ASCII码。

left

未下载的字节数,编码为十进制ASCII码。这个数不是通过文件长度和已下载数算出来的,因为文件可能在被续传,还有一些已经下载的数据不能通过完整性检查必须重新下载。

event

这是个选择性的关键值,选项有started,completed或stopped(或empty,等同于没有运行)。如果没有运行,这个声明会定期间隔一定时间发出。开始下载时发出started值,完成下载时发出completed。当文件完整后再开始,没有completed发出,下载者中止下载时发出stopped。

Tracker的回应也是B编码字典。如果Tracker回应中有关键值failure reason(失败原因),就会对应一个人可以读懂的字符串信息解释质询失败的原因,不需要其它关键值。否则,回应必须有两个关键值:interval(间隔)对应下载者定期发出请求的间隔秒数;peers,peer自选ID,IP地址或DNS主机名的字符串和端口号。记住peers不会完全按照计划的间隔发送请求,假如他们发生一个事件或者想要更多的peers。

如果你想对元信息文件或者Tracker质询进行扩展,请与Bram Cohen进行协调,确保所有扩展都兼容。

BitTorrent peer协议通过TCP协议进行操作。它不用调节任何socket选项就可以流畅运行。

peer之间的连接是对称的。两个方向送出的信息要协调一致,数据可以流入任一方。

peer协议指一个peer从零开始下载,每得到元信息文件索引中所描述的一个块且验证码一致,就向所有peer声明已得到此块。

连接的两个终端有2个状态指标,被阻塞与否,被关注与否,被阻塞(choking)是表明在恢复通畅之前数据不再发出的通知。发生阻塞的原因和技术问题稍后会提到。

数据传输发生在一方关注对方且对方没有阻塞的情况下。关注状态必须一致保持-如果一个没阻塞的peer没有别人需要的数据,别人对他就会失去关注,转而关注那些正在阻塞的peer。完全执行这种条件需要非常慎重,但这样的确可以让下载者知道哪些peer在阻塞消失后可以马上开始下载。

连接会逐渐断开不感兴趣和阻塞的peer。

当数据传输时,下载者要备好多份请求排成队列,以获得较高的TCP传输效率(这叫“管运请求”)。另一方面,不能被写入TCP缓冲区的请求要被立即排入内存,而不是一个应用程序级的网络缓冲,一旦阻塞出现,这些请求全部丢弃。

peer连线协议包括一次握手跟着不断的大小一致且确定的信息流。握手的开始是字符十九(十进制),跟着是字符串'BitTorrentprotocol'。开头的字符是长度固定的,希望其它新协议也能这样以便区分。

此后所有送入协议的整数都编码为4字节大中止端。

在现有的应用中头部数据之后是8个全部预留为0的字节,若果你想通过改变这8个预留字节以扩展协议,请与Bram Cohen协调以保证所有扩展兼容。

然后是来自元信息文件中B编码的info值中长20字节的SHA1验证码(和info_hash向Tracker声明的值相同,但这里是原始值那里是引用)。如果双方的值不同,连接断开。一个例外是下载者想只用一个端口进行多个连接下载,它们会先从接入连接得到一个验证码,然后和列表里面的对照,有相同的就答复。

验证码之后是20字节的peer id,它包含在Tracker回应的peer列表中,在向Tracker的请求中被报告。如果接受方peer id不符合发送方希望,连接断开。

握手完毕。之后是长度固定的交互信息流。零长度信息用来保持连接,被忽略。这种信息一般2分钟发出一次,但是在等待数据期间很容易超时。

所有非保持连接用信息开头的字节给出类型,可能值如下:

·0-阻塞
·1-通畅
·2-关注
·3-不关注
·4-有
·5-比特组
·6-请求
·7-块
·8-取消

“阻塞”、“通畅”、“关注”和“不关注”类信息没有荷载。

“比特组”类信息仅作为首信息发出。它负载一个比特组,下载者有索引的设为1,其它为0。开始下载时没有任何数据的下载者跳过“比特组”信息。首字节高位到低位对应索引0-7,依次类推,第二字节对应8-15,等等。尾部的剩余的比特位设为0。

“已有”类信息负载一个数,即刚下载并核对完验证码的索引数。

“请求”类信息包括包含一个索引,开始和长度。后两者是字节偏移。长度一般是2的权值除非被文件尾截断。现行一般是2的15次幂,并且关闭大于2的17次幂长度的连接。

“取消”类信息负载和“请求”类信息有一样的负载。它通常在下载接近完成即“最后阶段”发出。当下载快要完成时,剩下几个块有都从同一个线程下载的趋向,这样会很慢。为了确保剩余块下载迅速,一旦还没有决定剩余块的下载请求向谁发出,先向所有他正在从对方下载数据的连接者发送要求所有剩余块的请求。为避免低效,每当一个块开始下载就向其他peer发出取消信息。

“块”类信息包含一个索引,开始和块。记住它和“请求”类信息是相关的。当传输速度很慢或“阻塞”“通畅”类信息高频率交替发出或两者同时发生,可能会载到一个不需要的块。

下载者下载块的顺序是随机的,这样适当防止下载者与其他Peers仅有相同的块子集或超集。

阻塞的发生有很多原因。TCP协议的信息拥挤控制在即时向多连接发送信息的过程中表现极差。同时,阻塞的存在使下载者们能够用以牙还牙式的算法来确保稳定的下载速率。

下面描述的阻塞算法是目前基础的配置。重要的是所有新算法不光要在包含全部扩展算法的网络中运行良好,也要在主要包含这个基础算法的网络中运行良好。

一个优秀的阻塞算法有许多标准。它必须封锁一定同时上传的数量以获得良好的TCP表现,还要避免频繁的堵塞和通畅交替,即所谓“纤维化”。它应该用数据交换报答给自己数据的peer。最后,它还应该偶尔尝试一下与未使用过的peer端连接,找出比现有连接好的连接,这叫做尝试性疏通。

现行的阻塞算法避免纤维化的手段是每10秒转换被阻塞的名单。疏通4个自己关注且能从他们身上得到最高下载速率的peer,进行上传和数据交换。有较高上传速率但是不被关注下载者的peer被疏通,一旦这些peer开始被关注,那些上传率最低的peer的就被阻塞。如果下载者有了完整的文件,他用自己的上传率而不是下载率来决定疏通谁的连接。

在尝试性疏通中,任何一次中都有一个peer被疏通不管他的上传率如何(如果被关注,他会成为4个提供下载的peer之一)。被尝试性疏通的这种peer每30秒轮换一次。为了给它们一个上传整一个块的机会,新连接会以轮换中尝试性疏通次数的3倍开始连接。
wish.lijt@gmail.comQQ:11242538 call me

TOP

Bittorrent udp-tracker protocol extension
Contents

introduction
connecting
Client sends packet:
Server replies with packet:
announcing
Client sends packet:
Server replies with packet:
scraping
Client sends packet:
Server replies with packet:
errors
server replies packet:
actions
extensions
authentication
credits
introduction
A tracker with the protocol "udp://" in its URI is supposed to be contacted using this protocol.

This protocol is supported by xbt-tracker.

For additional information and descritptions of the terminology used in this document, see the protocol specification

All values are sent in network byte order (big endian). The sizes are specified with ANSI-C standard types.

If no response to a request is received within 15 seconds, resend the request. If no reply has been received after 60 seconds, stop retrying.

connecting
Client sends packet:size name description
int64_t connection_id Must be initialized to 0x41727101980 in network byte order. This will identify the protocol.
int32_t action 0 for a connection request
int32_t transaction_id Randomized by client.

Server replies with packet:size name description
int32_t action Describes the type of packet, in this case it should be 0, for connect. If 3 (for error) see errors.
int32_t transaction_id Must match the transaction_id sent from the client.
int64_t connection_id A connection id, this is used when further information is exchanged with the tracker, to identify you. This connection id can be reused for multiple requests, but if it's cached for too long, it will not be valid anymore.

announcing
Client sends packet:size name description
int64_t connection_id The connection id acquired from establishing the connection.
int32_t action Action. in this case, 1 for announce. See actions.
int32_t transaction_id Randomized by client.
int8_t[20] info_hash The info-hash of the torrent you want announce yourself in.
int8_t[20] peer_id Your peer id.
int64_t downloaded The number of byte you've downloaded in this session.
int64_t left The number of bytes you have left to download until you're finished.
int64_t uploaded The number of bytes you have uploaded in this session.
int32_t event The event, one of

none = 0
completed = 1
started = 2
stopped = 3

uint32_t ip Your ip address. Set to 0 if you want the tracker to use the sender of this udp packet.
uint32_t key A unique key that is randomized by the client.
int32_t num_want The maximum number of peers you want in the reply. Use -1 for default.
uint16_t port The port you're listening on.
uint16_t extensions See extensions

Server replies with packet:size name description
int32_t action The action this is a reply to. Should in this case be 1 for announce. If 3 (for error) see errors. See actions.
int32_t transaction_id Must match the transaction_id sent in the announce request.
int32_t interval the number of seconds you should wait until reannouncing yourself.
int32_t leechers The number of peers in the swarm that has not finished downloading.
int32_t seeders The number of peers in the swarm that has finished downloading and are seeding.

The rest of the server reply is a variable number of the following structure:

size name description
int32_t ip The ip of a peer in the swarm.
uint16_t port The peer's listen port.

scraping
Client sends packet:size name description
int64_t connection_id The connection id retreived from the establishing of the connection.
int32_t action The action, in this case, 2 for scrape. See actions.
int32_t transaction_id Randomized by client.
int16_t num_info_hashes The number of info-hashes that will follow.
uint16_t extensions See extensions.

The following structure is repeated num_info_hashes times:

size name description
int8_t[20] info_hash The info hash that is to be scraped.

Server replies with packet:size name description
int32_t action The action, should in this case be 2 for scrape. If 3 (for error) see errors.
int32_t transaction_id Must match the sent transaction id.

The rest of the packet contains the following structures once for each info-hash you asked in the scrape request.

size name description
int32_t complete The total number of completed downloads.
int32_t downloaded The current number of connected seeds.
int32_t incomplete The current number of connected leechers.

errors
In case of a tracker error,

server replies packet:size name description
int32_t action The action, in this case 3, for error. See actions.
int32_t transaction_id Must match the transaction_id sent from the client.
int8_t[] error_string The rest of the packet is a string describing the error.

actions
The action fields has the following encoding:

connect = 0
announce = 1
scrape = 2
error = 3 (only in server replies)
extensions
The extensions field is a bitmask. The following bits are assigned:

1 = authentication.
authenticationThe packet will have an authentication part appended to it. It has the following format:

size name description
int8_t username_length The number of characters in the username.
int8_t[] username The username, the number of characters as specified in the previous field.
uint8_t[8] passwd_hash sha1(packet + sha1(password)) The packet in this case means the entire packet except these 8 bytes that are the password hash. These are the 8 first bytes (most significant) from the 20 bytes hash calculated.

credits
Protocol designed by Olaf van der Spek
wish.lijt@gmail.comQQ:11242538 call me

TOP

通常BT客户端每几分钟就要向tracker发送一次请求.对于一些比较大的BT站点,其tracker的压力是可想而知的.降低tracker的压力首先考虑到的当然是采用更低网络开销的udp协议.于是Bittorrent udp-tracker protocol应运而生.
   这个协议很简单.
   下面是实现它的封装类:


// [wiki]UDP[/wiki]TrackerClient.h: interface for the CUDPTrackerClient class.
//
//////////////////////////////////////////////////////////////////////

#if !defined(AFX_UDPTRACKERCLIENT_H__69B6ACC8_8193_4680_81D8_925B1550E92C__INCLUDED_)
#define AFX_UDPTRACKERCLIENT_H__69B6ACC8_8193_4680_81D8_925B1550E92C__INCLUDED_

#if _MSC_VER > 1000
#pragma once
#endif // _MSC_VER > 1000

#include <winsock2.h>
#pragma comment(lib, "ws2_32.lib")

#ifndef _[wiki]DIS[/wiki]ABLEWARNING4786_4355
#define _DISABLEWARNING4786_4355
#pragma warning( disable : 4786 )
#pragma warning( disable : 4355 )
#endif
#ifndef _ENABLEUSESTL
#define _ENABLEUSESTL
#include <list>
#include <map>
#include <set>
#include <vector>
#include <queue>
#include <string>
using namespace std;
#endif

class CPeerHostInfo
{
public:        
DWORD  IP;//节点IP
WORD  Port;//节点端口
};
class CUDPTrackerClient  
{
public:
CUDPTrackerClient();
virtual ~CUDPTrackerClient();
void CancelSocketOperate();
BOOL Connect(const char * szServer,WORD wPort = 0);
DWORD Announcing(BYTE* pInfoHash,BYTE * pPeerID,
  __int64 idownloaded,__int64 ileft,__int64 iuploaded,
  int  ievent,
  DWORD dwIP,WORD  wPort);

BOOL Disconnect();
public:
SOCKET   m_socket;
DWORD   m_dwIP;
WORD   m_wPort;
__int64   m_iConnection_id;
DWORD   m_dwConnectTick;
string   m_strError;  //如果请求失败,此变量保存错误信息
DWORD m_dwDonePeers; //种子数
DWORD m_dwNumPeers; //当前下载者个数
DWORD m_dwInterval; //查询间隔时间
list<CPeerHostInfo> m_listPeers;
};

#endif // !defined(AFX_UDPTRACKERCLIENT_H__69B6ACC8_8193_4680_81D8_925B1550E92C__INCLUDED_)
// UDPTrackerClient.cpp: implementation of the CUDPTrackerClient class.
//
//////////////////////////////////////////////////////////////////////

#include "stdafx.h"
#include "UDPTrackerClient.h"

#include "DataStream.h"
#ifdef _DEBUG
#undef THIS_FILE
static char THIS_FILE[]=__FILE__;
#define new DEBUG_NEW
#endif

//////////////////////////////////////////////////////////////////////
// Construction/Destruction
//////////////////////////////////////////////////////////////////////
#define RECVBUFSIZE 2048

CUDPTrackerClient::CUDPTrackerClient()
{
m_socket = INVALID_SOCKET;
m_iConnection_id = 0;
m_dwConnectTick = 0;
m_dwIP = 0;
m_wPort = 0;
m_dwDonePeers = 0; //种子数
m_dwNumPeers = 0; //当前下载者个数
m_dwInterval = 0; //查询间隔时间
}

CUDPTrackerClient::~CUDPTrackerClient()
{
Disconnect();

}
void CUDPTrackerClient::CancelSocketOperate()
{
if(m_socket != INVALID_SOCKET)
{
  LINGER lingerStruct;    
  // If we're supposed to abort the connection, set the linger value
  // on the socket to 0.    
  lingerStruct.l_onoff = 1;
  lingerStruct.l_linger = 0;
  setsockopt(m_socket, SOL_SOCKET, SO_LINGER,
  (char *)&lingerStruct, sizeof(lingerStruct) );
}
}
BOOL CUDPTrackerClient:isconnect()
{
m_iConnection_id = 0;
m_dwDonePeers = 0; //种子数
m_dwNumPeers = 0; //当前下载者个数
m_dwInterval = 0; //查询间隔时间
if ( m_socket != INVALID_SOCKET )
{
  m_dwIP = 0;
  m_wPort = 0;
  // Now close the socket handle.  This will do an abortive or
  // graceful close, as requested.     
  shutdown(m_socket,SD_BOTH);
  closesocket(m_socket);
  m_socket = INVALID_SOCKET;
  return TRUE;
}
return FALSE;
}

//szServer连接的主机,可以是下列形式的字符串:
//easeso.com:1000
//easeso.com
//如果wPort不为0,则szServer不应该包含端口信息
BOOL CUDPTrackerClient::Connect(const char * szServer,WORD wPort)
{
m_strError = "";
BOOL bRes = FALSE;
if ( m_socket == INVALID_SOCKET )
{
  //用UDP初始化套接字
  BOOL optval = TRUE;  
  m_socket =socket(AF_INET,SOCK_DGRAM,0);
  if(m_socket == INVALID_SOCKET)
  return FALSE;
  //设置超时时间
  int TimeOut=10000;
  int err = setsockopt (m_socket, SOL_SOCKET,SO_RCVTIMEO,(CHAR *) &TimeOut,sizeof (TimeOut));
}
if(m_dwIP == 0)
{
  CString strServer = szServer;
  CString strHost;
  if(wPort == 0)
  {
  int iNext = strServer.Find(':');
  if(iNext>0)
  {  
   strHost = strServer.Mid(0,iNext);  
   CString strPort = strServer.Mid(iNext+1);  
   m_wPort = (WORD)atoi(strPort);
  }
  else
   strHost = strServer;
  }
  else
  {
  strHost = strServer;
  m_wPort = wPort;
  }
  if(m_wPort == 0)
  m_wPort = 80;  

  //Check if address is an IP or a Domain Name
  int a = strHost[0];
  if (a > 47 && a < 58)
  m_dwIP = inet_addr(strHost);
  else
  {
  struct hostent *pHost;
  pHost = gethostbyname(strHost);
  if(pHost != NULL)
   m_dwIP = *((ULONG*)pHost->h_addr);
  else
   m_dwIP = 0;
  }
}
if((GetTickCount()-m_dwConnectTick)>30000)
{
  m_dwConnectTick = 0;
  m_iConnection_id = 0;
}
if(m_socket != INVALID_SOCKET && m_dwIP && m_wPort && m_iConnection_id ==0)
{
  DWORD dwTransaction_id = GetTickCount();
  SOCKADDR_IN  from;
  int fromlength=sizeof(SOCKADDR);

  char buf[RECVBUFSIZE];
  from.sin_family=AF_INET;
  from.sin_addr.s_addr=m_dwIP;
  from.sin_port=htons(m_wPort);
  CDataStream sendstream(buf,2047);
  sendstream.clear();
  __int64 iCID = 0x41727101980;
  sendstream.writeint64(CNetworkByteOrder::convert(iCID));
  sendstream.writedword(CNetworkByteOrder::convert((int)0));
  sendstream.writedword(dwTransaction_id);
  
  int iRes = 0;
  int iTimes = 6;
  while(iTimes>0&&m_dwIP)
  {
  sendto(m_socket,sendstream.getbuffer(),sendstream.size(),0,(struct sockaddr FAR *)&from,sizeof(from));
  iRes = recvfrom(m_socket,buf,RECVBUFSIZE-1,0,(struct sockaddr FAR *)&from,(int FAR *)&fromlength);
  if(iRes >=0)
   break;
  iTimes--;
  }
  if(iRes>=16)
  {
  CDataStream recvstream(buf,RECVBUFSIZE-1);
  DWORD dwAction = (DWORD)CNetworkByteOrder::convert((int)recvstream.readdword());
  DWORD dwTIDResp= recvstream.readdword();
  if(dwTIDResp == dwTransaction_id)
  {
   if(dwAction == 0)
   {
    m_iConnection_id = recvstream.readint64();  
    //BitComet将回复0x16字节数据,最后6字节是服务器查看到的本地IP和UDP端口
   }
   else if(dwAction == 3)//得到一个错误信息包
   {
    buf[iRes]=0;
    m_strError = recvstream.readstring();
   }
  }
  }
}
if(m_iConnection_id)
  bRes = TRUE;
return bRes;
}
//提交请求
//pInfoHash 20字节的数据缓冲区指针
//pPeerID 20字节的数据缓冲区指针
//ievent参数值:
//none = 0
//completed = 1
//started = 2
//stopped = 3

DWORD CUDPTrackerClient::Announcing(BYTE* pInfoHash,BYTE * pPeerID,
        __int64 idownloaded,__int64 ileft,__int64 iuploaded,
        int  ievent,
        DWORD dwIP,WORD  wPort)
{
m_listPeers.clear();
m_dwNumPeers = 0;
m_dwDonePeers = 0;
m_strError = "";
DWORD dwReturnCode = 0;
if(m_iConnection_id && m_socket != INVALID_SOCKET && m_dwIP & m_wPort)
{
  DWORD dwTransaction_id = GetTickCount();
  //srand(dwTransaction_id);
  //DWORD dwKey = rand();
  DWORD dwKey = 0x3753;
  SOCKADDR_IN  from;
  int fromlength=sizeof(SOCKADDR);
  char buf[RECVBUFSIZE];
  from.sin_family=AF_INET;
  from.sin_addr.s_addr=m_dwIP;
  from.sin_port=htons(m_wPort);
  CDataStream sendstream(buf,RECVBUFSIZE-1);
  sendstream.clear();  
  sendstream.writeint64(m_iConnection_id);
  sendstream.writedword(CNetworkByteOrder::convert((int)1));
  sendstream.writedword(dwTransaction_id);
  sendstream.writedata(pInfoHash,20);
  sendstream.writedata(pPeerID,20);
  sendstream.writeint64(CNetworkByteOrder::convert(idownloaded));
  sendstream.writeint64(CNetworkByteOrder::convert(ileft));
  sendstream.writeint64(CNetworkByteOrder::convert(iuploaded));
  sendstream.writedword(CNetworkByteOrder::convert(ievent));
  sendstream.writedword(dwIP);
  sendstream.writedword(CNetworkByteOrder::convert((int)dwKey));
  sendstream.writedword(CNetworkByteOrder::convert((int)200));
  sendstream.writedword(CNetworkByteOrder::convert(wPort));
  
  int iRes = 0;
  int iTimes = 2;
  while(iTimes>0&&m_dwIP)
  {
  sendto(m_socket,sendstream.getbuffer(),sendstream.size(),0,(struct sockaddr FAR *)&from,sizeof(from));
  iRes = recvfrom(m_socket,buf,RECVBUFSIZE-1,0,(struct sockaddr FAR *)&from,(int FAR *)&fromlength);
  if(iRes >=0)
   break;
  iTimes--;
  }
  if(iRes>=20)
  {  
  CDataStream recvstream(buf,RECVBUFSIZE-1);
  DWORD dwAction = (DWORD)CNetworkByteOrder::convert((int)recvstream.readdword());
  DWORD dwTIDResp= recvstream.readdword();
  if(dwTIDResp == dwTransaction_id)
  {
   if(dwAction == 1)
   {
    m_dwInterval = (DWORD)CNetworkByteOrder::convert((int)recvstream.readdword());
    m_dwNumPeers = (DWORD)CNetworkByteOrder::convert((int)recvstream.readdword());
    m_dwDonePeers = (DWORD)CNetworkByteOrder::convert((int)recvstream.readdword());     
    CPeerHostInfo hi;     
    for(int iCurPos = 20;iCurPos+6<=iRes;iCurPos+=6)
    {
     hi.IP= recvstream.readdword();
     hi.Port = (WORD)CNetworkByteOrder::convert((unsigned short)recvstream.readword());
     m_listPeers.push_back(hi);
    }
    if(m_dwNumPeers>m_listPeers.size())
    {
     iRes = 0;
     iTimes = 6;
     while(iTimes>0&&m_dwIP)
     {
      iRes = recvfrom(m_socket,buf,RECVBUFSIZE-1,0,(struct sockaddr FAR *)&from,(int FAR *)&fromlength);
      if(iRes >=0)
       break;
      iTimes--;
     }
     if(iRes>=6)
     {
      for(iCurPos = 0;iCurPos+6<=iRes;iCurPos+=6)
      {
       hi.IP= recvstream.readdword();
       hi.Port = (DWORD)CNetworkByteOrder::convert((int)recvstream.readword());
       m_listPeers.push_back(hi);
      }
     }     
    }
    m_dwNumPeers = m_listPeers.size();
    dwReturnCode = 200;
   }
   else if(dwAction == 3)//得到一个错误信息包
   {
    buf[iRes]=0;
    m_strError = recvstream.readstring();
    dwReturnCode = 400;
   }
  }
  }
}
//每次都要求重新连接
m_iConnection_id = 0;
return dwReturnCode;
}
// DataStream.h: interface for the CDataStream class.
//
//////////////////////////////////////////////////////////////////////

#if !defined(AFX_DATASTREAM_H__D90A2534_EA73_4BEA_8B7E_87E59A3D1D26__INCLUDED_)
#define AFX_DATASTREAM_H__D90A2534_EA73_4BEA_8B7E_87E59A3D1D26__INCLUDED_

#if _MSC_VER > 1000
#pragma once
#endif // _MSC_VER > 1000
#include <stdio.h>
//数据流操作函数
class CDataStream  
{
public :
CDataStream(char * szBuf,int isize)
{
  m_isize = isize;
  buffer = szBuf;
  current = buffer;
}
~CDataStream()
{
}
   void clear()
{
  current = buffer;
  current[0]=0;
}
//此函数不动态增加内存,一次打印的数据长度不应该超过缓冲区的三分之一,否则可能导致失败
bool printf(const char * format,...)
{
  if(current)
  {
  if(current - buffer > (m_isize*2)/3)
   return false;
  va_list argPtr ;
  va_start( argPtr, format ) ;
  int count = vsprintf( current, format, argPtr ) ;
  va_end( argPtr );
  current += count ;
  return true;
  }
  return false;
}
//此函数拷贝字符串
bool strcpy(const char * szStr)
{
  if(current&&szStr)
  {
  int ilen = lstrlen(szStr);
  if((m_isize-(current - buffer)) < (ilen +2))
   return false;
  memcpy(current,szStr,ilen+1);
  current += ilen;
  return true;
  }
  return false;
}
char * getcurrentpos()
{
  return current;
}
void move(int ilen)//当前指针向后移动ilen
{
  current += ilen;
}
void reset()
{
  current = buffer;
}
BYTE readbyte()
{
  current ++;
  return *(current-1);
}
void writebyte(BYTE btValue)
{
  *current = btValue;
  current ++;  
}
WORD readword()
{
  current +=2;
  return *((WORD*)(current-2));
}
void writeword(WORD wValue)
{  
  *((WORD*)current) = wValue;
  current +=2;
}
DWORD readdword()
{
  current +=4;
  return *((DWORD*)(current-4));
}
void writedword(DWORD dwValue)
{
  *((DWORD*)current) = dwValue;
  current +=4;
}
__int64 readint64()
{
  current +=8;
  return *((__int64*)(current-8));
}
void writeint64(__int64 iValue)
{
  *((__int64*)current) = iValue;
  current +=8;
}
BYTE * readdata(DWORD dwLen)
{
  current +=dwLen;
  return (BYTE*)(current-dwLen);
}
void writedata(BYTE * pData,DWORD dwLen)
{
  memcpy(current,pData,dwLen);  
  current +=dwLen;
}
char * readstring()
{
  char * szRes = current;
  int ilen = lstrlen(current);
  current +=(ilen+1);
  return szRes;
}
int size()
{
  return (int)(current-buffer);
}
const char * getbuffer(){return buffer;}
private :
char* buffer;
   char* current;
int m_isize;
};
class CNetworkByteOrder
{
public:
static unsigned short int convert(unsigned short int iValue)
{
  unsigned short int iData;
  ((BYTE*)&iData)[0] = ((BYTE*)&iValue)[1];
  ((BYTE*)&iData)[1] = ((BYTE*)&iValue)[0];
  return iData;
}
static int convert(int iValue)
{
  int iData;
  ((BYTE*)&iData)[0] = ((BYTE*)&iValue)[3];
  ((BYTE*)&iData)[1] = ((BYTE*)&iValue)[2];
  ((BYTE*)&iData)[2] = ((BYTE*)&iValue)[1];
  ((BYTE*)&iData)[3] = ((BYTE*)&iValue)[0];
  return iData;
}
static __int64 convert(__int64 iValue)
{
  __int64 iData;
  ((BYTE*)&iData)[0] = ((BYTE*)&iValue)[7];
  ((BYTE*)&iData)[1] = ((BYTE*)&iValue)[6];
  ((BYTE*)&iData)[2] = ((BYTE*)&iValue)[5];
  ((BYTE*)&iData)[3] = ((BYTE*)&iValue)[4];
  ((BYTE*)&iData)[4] = ((BYTE*)&iValue)[3];
  ((BYTE*)&iData)[5] = ((BYTE*)&iValue)[2];
  ((BYTE*)&iData)[6] = ((BYTE*)&iValue)[1];
  ((BYTE*)&iData)[7] = ((BYTE*)&iValue)[0];
  return iData;
}

};
#endif // !defined(AFX_DATASTREAM_H__D90A2534_EA73_4BEA_8B7E_87E59A3D1D26__INCLUDED_)
wish.lijt@gmail.comQQ:11242538 call me

TOP

BitTorrent is a protocol for distributing files. It identifies content by URL and is designed to integrate seamlessly with the web. Its advantage over plain HTTP is that when multiple downloads of the same file happen concurrently, the downloaders upload to each other, making it possible for the file source to support very large numbers of downloaders with only a modest increase in its load.

A BitTorrent file distribution consists of these entities:

An ordinary web server
A static 'metainfo' file
A BitTorrent tracker
An 'original' downloader
The end user web browsers
The end user downloaders
There are ideally many end users for a single file.

To start serving, a host goes through the following steps:

Start running a tracker (or, more likely, have one running already).
Start running an ordinary web server, such as apache, or have one already.
Associate the extension .torrent with mimetype application/x-bittorrent on their web server (or have done so already).
Generate a metainfo (.torrent) file using the complete file to be served and the URL of the tracker.
Put the metainfo file on the web server.
Link to the metainfo (.torrent) file from some other web page.
Start a downloader which already has the complete file (the 'origin').
To start downloading, a user does the following:

Install BitTorrent (or have done so already).
Surf the web.
Click on a link to a .torrent file.
Select where to save the file locally, or select a partial download to resume.
Wait for download to complete.
Tell downloader to exit (it keeps uploading until this happens).
The connectivity is as follows:

The web site is serving up static files as normal, but kicking off the BitTorrent helper app on the clients.
The tracker is receiving information from all downloaders and giving them random lists of peers. This is done over HTTP or HTTPS.
Downloaders are periodically checking in with the tracker to keep it informed of their progress, and are uploading to and downloading from each other via direct connections. These connections use the BitTorrent peer protocol, which operates over TCP.
The origin is uploading but not downloading at all, since it has the entire file. The origin is necessary to get the entire file into the network. Often for popular downloads the origin can be taken down after a while since several downloads may have completed and been left running indefinitely.
Metainfo file and tracker responses are both sent in a simple, efficient, and extensible format called bencoding (pronounced 'bee encoding'). Bencoded messages are nested dictionaries and lists (as in Python), which can contain strings and integers. Extensibility is supported by ignoring unexpected dictionary keys, so additional optional ones can be added later.

Bencoding is done as follows:

Strings are length-prefixed base ten followed by a colon and the string. For example 4:spam corresponds to 'spam'.
Integers are represented by an 'i' followed by the number in base 10 followed by an 'e'. For example i3e corresponds to 3 and i-3e corresponds to -3. Integers have no size limitation. i-0e is invalid. All encodings with a leading zero, such as i03e, are invalid, other than i0e, which of course corresponds to 0.
Lists are encoded as an 'l' followed by their elements (also bencoded) followed by an 'e'. For example l4:spam4:eggse corresponds to ['spam', 'eggs'].
Dictionaries are encoded as a 'd' followed by a list of alternating keys and their corresponding values followed by an 'e'. For example, d3:cow3:moo4:spam4:eggse corresponds to {'cow': 'moo', 'spam': 'eggs'} and d4:spaml1:a1:bee corresponds to {'spam': ['a', 'b']} . Keys must be strings and appear in sorted order (sorted as raw strings, not alphanumerics).
Metainfo files are bencoded dictionaries with the following keys:

announce
The URL of the tracker.

info
This maps to a dictionary, with keys described below.

The name key maps to a string which is the suggested name to save the file (or directory) as. It is purely advisory.

piece length maps to the number of bytes in each piece the file is split into. For the purposes of transfer, files are split into fixed-size pieces which are all the same length except for possibly the last one which may be truncated. Piece length is almost always a power of two, most commonly 218 = 256 K (BitTorrent prior to version 3.2 uses 220 = 1 M as default).

pieces maps to a string whose length is a multiple of 20. It is to be subdivided into strings of length 20, each of which is the SHA1 hash of the piece at the corresponding index.

There is also a key length or a key files, but not both or neither. If length is present then the download represents a single file, otherwise it represents a set of files which go in a directory structure.

In the single file case, length maps to the length of the file in bytes.

For the purposes of the other keys, the multi-file case is treated as only having a single file by concatenating the files in the order they appear in the files list. The files list is the value files maps to, and is a list of dictionaries containing the following keys:

length
The length of the file, in bytes.
path
A list of strings corresponding to subdirectory names, the last of which is the actual file name (a zero length list is an error case).
In the single file case, the name key is the name of a file, in the muliple file case, it's the name of a directory.

Tracker queries are two way. The tracker receives information via HTTP GET parameters and returns a bencoded message. Note that although the current tracker implementation has its own web server, the tracker could run very nicely as, for example, an apache module.

Tracker GET requests have the following keys:

info_hash
The 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. Note that this is a substring of the metainfo file. This value will almost certainly have to be escaped.

peer_id
A string of length 20 which this downloader uses as its id. Each downloader generates its own id at random at the start of a new download. This value will also almost certainly have to be escaped.

ip
An optional parameter giving the IP (or dns name) which this peer is at. Generally used for the origin if it's on the same machine as the tracker.

port
The port number this peer is listening on. Common behavior is for a downloader to try to listen on port 6881 and if that port is taken try 6882, then 6883, etc. and give up after 6889.

uploaded
The total amount uploaded so far, encoded in base ten ascii.

downloaded
The total amount downloaded so far, encoded in base ten ascii.

left
The number of bytes this peer still has to download, encoded in base ten ascii. Note that this can't be computed from downloaded and the file length since it might be a resume, and there's a chance that some of the downloaded data failed an integrity check and had to be re-downloaded.

event
This is an optional key which maps to started, completed, or stopped (or empty, which is the same as not being present). If not present, this is one of the announcements done at regular intervals. An announcement using started is sent when a download first begins, and one using completed is sent when the download is complete. No completed is sent if the file was complete when started. Downloaders send an announcement using 'stopped' when they cease downloading.

Tracker responses are bencoded dictionaries. If a tracker response has a key failure reason, then that maps to a human readable string which explains why the query failed, and no other keys are required. Otherwise, it must have two keys: interval, which maps to the number of seconds the downloader should wait between regular rerequests, and peers. peers maps to a list of dictionaries corresponding to peers, each of which contains the keys peer id, ip, and port, which map to the peer's self-selected ID, IP address or dns name as a string, and port number, respectively. Note that downloaders may rerequest on nonscheduled times if an event happens or they need more peers.

If you want to make any extensions to metainfo files or tracker queries, please coordinate with Bram Cohen to make sure that all extensions are done compatibly.

BitTorrent's peer protocol operates over TCP. It performs efficiently without setting any socket options.

Peer connections are symmetrical. Messages sent in both directions look the same, and data can flow in either direction.

The peer protocol refers to pieces of the file by index as described in the metainfo file, starting at zero. When a peer finishes downloading a piece and checks that the hash matches, it announces that it has that piece to all of its peers.

Connections contain two bits of state on either end: choked or not, and interested or not. Choking is a notification that no data will be sent until unchoking happens. The reasoning and common techniques behind choking are explained later in this document.

Data transfer takes place whenever one side is interested and the other side is not choking. Interest state must be kept up to date at all times - whenever a downloader doesn't have something they currently would ask a peer for in unchoked, they must express lack of interest, despite being choked. Implementing this properly is tricky, but makes it possible for downloaders to know which peers will start downloading immediately if unchoked.

Connections start out choked and not interested.

When data is being transferred, downloaders should keep several piece requests queued up at once in order to get good TCP performance (this is called 'pipelining'.) On the other side, requests which can't be written out to the TCP buffer immediately should be queued up in memory rather than kept in an application-level network buffer, so they can all be thrown out when a choke happens.

The peer wire protocol consists of a handshake followed by a never-ending stream of length-prefixed messages. The handshake starts with character ninteen (decimal) followed by the string 'BitTorrent protocol'. The leading character is a length prefix, put there in the hope that other new protocols may do the same and thus be trivially distinguishable from each other.

All later integers sent in the protocol are encoded as four bytes big-endian.

After the fixed headers come eight reserved bytes, which are all zero in all current implementations. If you wish to extend the protocol using these bytes, please coordinate with Bram Cohen to make sure all extensions are done compatibly.

Next comes the 20 byte sha1 hash of the bencoded form of the info value from the metainfo file. (This is the same value which is announced as info_hash to the tracker, only here it's raw instead of quoted here). If both sides don't send the same value, they sever the connection. The one possible exception is if a downloader wants to do multiple downloads over a single port, they may wait for incoming connections to give a download hash first, and respond with the same one if it's in their list.

After the download hash comes the 20-byte peer id which is reported in tracker requests and contained in peer lists in tracker responses. If the receiving side's peer id doesn't match the one the initiating side expects, it severs the connection.

That's it for handshaking, next comes an alternating stream of length prefixes and messages. Messages of length zero are keepalives, and ignored. Keepalives are generally sent once every two minutes, but note that timeouts can be done much more quickly when data is expected.

All non-keepalive messages start with a single byte which gives their type. The possible values are:

0 - choke
1 - unchoke
2 - interested
3 - not interested
4 - have
5 - bitfield
6 - request
7 - piece
8 - cancel
'choke', 'unchoke', 'interested', and 'not interested' have no payload.

'bitfield' is only ever sent as the first message. Its payload is a bitfield with each index that downloader has sent set to one and the rest set to zero. Downloaders which don't have anything yet may skip the 'bitfield' message. The first byte of the bitfield corresponds to indices 0 - 7 from high bit to low bit, respectively. The next one 8-15, etc. Spare bits at the end are set to zero.

The 'have' message's payload is a single number, the index which that downloader just completed and checked the hash of.

'request' messages contain an index, begin, and length. The last two are byte offsets. Length is generally a power of two unless it gets truncated by the end of the file. All current implementations use 215, and close connections which request an amount greater than 217.

'cancel' messages have the same payload as request messages. They are generally only sent towards the end of a download, during what's called 'endgame mode'. When a download is almost complete, there's a tendency for the last few pieces to all be downloaded off a single hosed modem line, taking a very long time. To make sure the last few pieces come in quickly, once requests for all pieces a given downloader doesn't have yet are currently pending, it sends requests for everything to everyone it's downloading from. To keep this from becoming horribly inefficient, it sends cancels to everyone else every time a piece arrives.

'piece' messages contain an index, begin, and piece. Note that they are correlated with request messages implicitly. It's possible for an unexpected piece to arrive if choke and unchoke messages are sent in quick succession and/or transfer is going very slowly.

Downloaders generally download pieces in random order, which does a reasonably good job of keeping them from having a strict subset or superset of the pieces of any of their peers.

Choking is done for several reasons. TCP congestion control behaves very poorly when sending over many connections at once. Also, choking lets each peer use a tit-for-tat-ish algorithm to ensure that they get a consistent download rate.

The choking algorithm described below is the currently deployed one. It is very important that all new algorithms work well both in a network consisting entirely of themselves and in a network consisting mostly of this one.

There are several criteria a good choking algorithm should meet. It should cap the number of simultaneous uploads for good TCP performance. It should avoid choking and unchoking quickly, known as 'fibrillation'. It should reciprocate to peers who let it download. Finally, it should try out unused connections once in a while to find out if they might be better than the currently used ones, known as optimistic unchoking.

The currently deployed choking algorithm avoids fibrillation by only changing who's choked once every ten seconds. It does reciprocation and number of uploads capping by unchoking the four peers which it has the best download rates from and are interested. Peers which have a better upload rate but aren't interested get unchoked and if they become interested the worst uploader gets choked. If a downloader has a complete file, it uses its upload rate rather than its download rate to decide who to unchoke.

For optimistic unchoking, at any one time there is a single peer which is unchoked regardless of it's upload rate (if interested, it counts as one of the four allowed downloaders.) Which peer is optimistically unchoked rotates every 30 seconds. To give them a decent chance of getting a complete piece to upload, new connections are three times as likely to start as the current optimistic unchoke as anywhere else in the rotation.
wish.lijt@gmail.comQQ:11242538 call me

TOP

BT是如何采用激励机制来达到健壮性的

Bram Cohen
bram@bitconjurer.org
2003年5月22日
翻译:小马哥
日期:2004-6-1
修改:扬帆
日期:2005-5-9

概要
BitTorrent 文件发布系统采用针锋相对(tit_for_tat)的方法来达到帕累托有效,与当前已知的协作技术相比,它具有更高的活力。本文将解释BitTorrent 的用途,以及是怎样用经济学的方法来达到这个目标的。

1、BitTorrent 用来做什么?
当通过[wiki]HTTP协议[/wiki]来下载一个文件的时候,所有的上载开销都在主机上。而使用 BitTorrent,当多个人同时下载同一个文件的时候,他们之间也相互为对方提供文件的部分片断的下载。这样,就把上载的开销分摊到每个下载者那里,也就可以在理论上支持无限多个下载者来下载同一个文件。
研究人员以前也在寻找一种达到这种效果的可实用的技术[3]。这种技术原来并没有在大的范围内运用过,因为逻辑和的问题非常棘手。如果仅仅计算哪些 peers 拥有文件的哪些片断以及这些片断应该被发送给谁,那么很难只产生比较小的系统开销。Peers之间的连接很少会超过几个小时,通常是几分钟而已。最后,有一个普遍的问题,就是公平性。
我们将解释BitTorrent 是如何很好的解决这些问题的。

1.1、BitTorrent接口
BitTorrent 的接口可能是最简单的。用户点击希望下载的文件的超级链接,然后会弹出一个标准的“保存到”对话框。此后,出现一个下载进度的窗口,在这个窗口中,除了显示下载速率外,还显示一个上载速率。BT在使用上非常简单,使得BT能广泛的被运用。

1.2、部署
决定采用BitTorrent的原因是因为有一些文件需要发布。而下载者使用 BitTorrent,是因为这是他们获取所需要的文件的唯一途径。下载者经常一完成下载,就停止为别人上载,虽然说,在BT 客户端完成下载之后,继续为别人提供一段时间的上载是一种礼貌的行为。标准的实现是让客户端一直保持上载,除非窗口被关闭。
在一个典型的部署中,未完成的下载者
一台主机负责提供原始的文件,下载者通过BT来下载这个文件。下载者在下载的同时,为其它人提供上载,最后,离开这个系统。

2、技术框架

2.1发布内容
为了部署 BT,首先将一个扩展名为 .torrent 的文件放在一个普通的web服务器上。.torrent文件包含了要共享的文件的信息,包括文件名、大小、文件的散列信息和一个指向tracker的url。Tracker负责帮助下载者能够获取其它下载者的信息。Tracker和下载者之间使用一种很简单的基于HTTP的协议进行交互,下载者告诉tracker自己要下载的文件、自己使用的端口以及类似的信息,tracker告诉下载者其它下载同样文件的下载者的联系信息。下载者利用这些信息相互之间建立连接。一个被成为“种子”的下载者,必须首先被启动,它知道完整的文件信息。对tracker和web服务器的带宽需求很低,而种子必须至少发送原始文件的一份完整拷贝。

译注:
P2P的核心思想就是没有服务器的概念,任何一个下载者既是client,又是server。
下载者从别人那里取文件的时候,称为下载,而为别人提供文件的时候,称为上载(传)。
为了完成一次部署,至少需要一个 tracker和一个seed。所谓tracker,是一个服务器,负责帮助peers之间相互建立连接。而seed,通常是第一个向tracker注册,然后它就开始进入循环,等待为别人提供文件,也就是说,第一个seed只负责上传文件。一旦有一个peer向tracker注册后,就可以取得seed的信息,从而与seed建立连接。从seed处读取文件。由于原始的文件,只有seed拥有,所有说,seed至少要上传原始文件的一份完整拷贝。如果又有一个peer加入进来,那么它可以同时和seed和前一个peer建立连接,然后从这两者处获取文件。

2.2对等发布
所有和文件下载相关的逻辑问题,通过 peers之间的交互来解决。一些关于下载和上传的速率的信息被发送给tracker,tracker搜集这些信息用于统计。Tracker的职责被严格限定为“帮助peers相互发现对方”。
尽管tracker是peers之间相互发现的唯一途径,也是peers之间相互协作的唯一地点,标准的tracker算法返回一个随机的 peers的列表。随机图具有非常强大的特性,许多的peer选择算法最终产生了一个幂律图,幂律图能以少量的搅拌来获得分片。注意,peers之间的连接都是双向传输的。
为了跟踪每个peers都拥有什么,BT将文件切割为固定大小的片(典型的大小是256k)。每个下载者必须通知其它peers,它拥有哪些片。为了验证文件的完整性,对每个片断都通过SHA1算法计算出它的hash信息,并保存在torrent文件中。Peers只有在检查了片断的完整性之后,才会通知其它peers它拥有这个片断。删除代码是一种被建议使用的帮助文件分布的技术,但是这种更简单的方法(既分片)也是可用的。
Peers不断的从它能连接到的peers那里下载文件片断。当然,它不能从没有跟它建立连接的peers那里下载任何东西。即使是建立了连接的peers,有的也并不包含它想要的片断,或者还不允许它去下载。关于不允许其它peers从它那里下载文件片断的策略,被称为 阻塞choking,后文将进行讨论。其它关于文件分布的方法通常都要用到麻烦的树结构,而且树叶的上载能力并没有被利用起来。简单的让 peers 宣布它拥有什么会导致不到 10 % 的带宽开支,却可以可靠的使用所有的上载能力。

2.3流水作业
构架在TCP之上的应用层协议,例如BT,很重要的一点是应该同时发送多个请求,以避免在两个片断发送之间的延迟,因为那样会严重影响传输速率。为了达到这种目的,BT将每个片断又进一步分为子片断,每个子片断的大小一般是16k,同时,它一直保持几个请求(通常是5个)被流水的同时发送。流水作业所选择的data(应该是指的同时发送的请求数目,也就是5个request)的依据是能使得大多数连接变得饱和。
译注:也就是说,每次发送5个请求,然后过一段时间,又发送5个请求。流水作业在HTTP 协议1.1版本中被广泛运用。

2.4片断选择
选择一个好的顺序来下载片断,对提高性能非常重要。一个差的片断选择算法可能导致所有的片断都处于下载中,或者另一种情况,没有任何片断被上载给其它 peers。

2.4.1严格的优先级
片断选择的第一个策略是:一旦请求了某个片断的子片断,那么该片断剩下的子片断优先被请求。这样,可以尽可能快的获得一个完整的片断

2.4.2最少的优先
对一个下载者来说,在选择下一个被下载的片断时,通常选择的是它的peers们所拥有的最少的那个片断,也就是所谓的“最少优先”。这种技术,确保了每个下载者都拥有它的peers们最希望得到的那些片断,从而一旦有需要,上载就可以开始。这也确保了那些越普通的片断越放在最后下载,从而减少了这样一种可能性,即某个peer当前正提供上载,而随后却没有任何的被别人感兴趣的片断了。

译注:
也就说说,每个peer都优先选择整个系统中最少的那些片断去下载,而那些在系统中相对较多的片断,放在后面下载,这样,整个系统就趋向于一种更优的状态。如果不用这种算法,大家都去下载最多的那些片断,那么这些片断就会在系统中分布的越来越多,而那些在系统中相对较少的片断仍然很少,最后,某些 peer 就不再拥有其它 peer 感兴趣的片断了,那么系统的参与者越来越少,整个系统的性能就下降。
在BT系统中,充分考虑了经济学的概念,处处从整个系统的性能出发,参与者越多,系统越优化。

信息理论显示除非种子上传了文件的所有片断,否则没有任何下载者可以完成所有文件的下载。如果在一个部署中,只有一个种子,而且种子的上载能力比它的大多数下载者都要差,那么,不同的下载者从种子那里下载不同的片断,性能就会变得比较好,因为,重复的下载浪费了种子获取更多信息的机会。“最少优先”使得下载者只从种子处下载新的片断(也就是整个系统中其它peer都没有的片断),因为,下载者能够看到其它peers那里已经有了种子已经上传的片断。

在某些部署中,原始的种子由于某些原因最终关闭,只好由剩下的这些下载者们来负责上传。这样显然会带来一个风险:某些片断任何一个下载者都不拥有。“最少优先”也很好的处理了这种情况。通过尽快的复制最少的片断,减少了这种由于当前的peers停止上载后带来的风险。

2.4.3随机的第一个片断
“最少优先”的一个例外是在下载刚开始的时候。此时,下载者没有任何片断可供上传,所以,需要尽快的获取一个完整的片断。而最少的片断,通常只有某一个peer拥有,所以,它可能比多个peers都拥有的那些片断下载的要慢。因此,第一个片断是随机选择的,直到第一个片断下载完成,才切换到“最少优先”的策略。

2.4.4最后阶段模式
有时候,从一个速率很慢的peer那里请求一个片断。在下载的中间阶段,这不是什么问题,但是却可能潜在的延迟下载的完成。为了防止这种情况,在最后阶段,peer向它的所有的peers们都发送某片断的子片断的请求,一旦某些子片断到了,那么就会向其它peer发送cancel 消息,取消对这些子片断的请求,以避免带宽的浪费。实际上,用这种方法并没有浪费多少带宽,而文件的结束部分也一直下载的非常快。

3 阻塞(choking)算法
BT并不集中分配资源。每个peer自己有责任来尽可能的提高它的下载速率。Peers从它可以连接的peers处下载文件,并根据对方提供的下载速率给予同等的上传回报(你敬我一尺,我敬你一丈)。对于合作者,提供上传服务,对于不合作的,就阻塞对方。所以说,阻塞是一种临时的拒绝上传策略,虽然上传停止了,但是下载仍然继续。在阻塞停止的时候,连接并不需要重新建立。
阻塞算法并不属于BT对等协议(指peers 之间交互的协议)的技术部分,但是对提高性能是必要的。一个好的阻塞算法应该利用所有可用的资源,为所有下载者提供一致可靠的下载速率,并适当惩罚那些只下载而不上传的peers。

3.1帕累托有效
帕累托有效是指资源配置已达到这样一种境地,即任何重新改变资源配置的方式,都不可能使一部分人在没有其他人受损的情况下受益。这一资源配置的状态,被称为“帕累托最优”(Pareto optimum)状态,或称为“帕累托有效”(Pareto efficient)
在计算机领域,寻求帕累托有效是一种本地优化算法
BitTorrent的阻塞算法,用一种针锋相对的方式来试图达到帕累托最优。(原文不太好翻译,我简化了)。Peers对那些向他提供上传服务的peers给予同样的回报,目的是希望在任何时候都有若干个连接正在进行着双向传输。

3.2 BitTorrent的阻塞算法
从技术层面上说,BT的每个peer一直与固定数量的其它 peers 保持疏通(通常是4个),所以问题就变成了哪些peers应该保持疏通?这种方法使得TCP的拥塞控制性能能够可靠的饱和上传容量。(也就是说,尽量让整个系统的上传能力达到最大)。
严格的根据当前的下载速率来决定哪些peers应该保持疏通。令人惊讶的是,计算当前下载速率是个大难题。当前的实现实质上是一个每隔20秒的轮询。而原来的算法是对一个长时间的网络传输进行总计,但这种方法很差劲,因为由于资源可用或者不可用,带宽会变化的很快。
为了避免因为频繁的阻塞和疏通 peers造成的资源浪费,BT每隔10秒计算一次哪个peer需要被阻塞,然后将这种状态保持到下一个10秒。10秒已经足够使得TCP来调整它的传输性能到最大。

3.3、optimistic unchoking
如果只是简单的为提供最好的下载速率的peers们提供上载,那么就没有办法来发现那些空闲的连接是否比当前正使用的连接更好。为了解决这个问题,在任何时候,每个peer都拥有一个称为“optimistic unchoking”的连接,这个连接总是保持疏通状态,而不管它的下载速率是怎样。每隔30秒,重新计算一次哪个连接应该是“optimistic unchoking”。30秒足以让上载能力达到最大,下载能力也相应的达到最大。这种和针锋相对类似的思想非常的伟大。“optimistic unchoking”非常和谐的与“囚徒困境”合作。

3.4、反对歧视
某些情况下,一个peer可能被它所有的peers都阻塞了,这种情况下,它将会保持较低的下载速率直到通过“optimistic unchoking”找到更好peers。为了减轻这种问题,如果一段时间过后,从某个peer那里一个片断也没有得到,那么这个peer认为自己被对方“怠慢”了,于是不再为对方提供上传,除非对方是“optimistic unchoking”。这种情况频繁发生,会导致多于一个的并发的“optimistic unchoking”。

3.5仅仅上传
一旦某个peer完成了下载,它不能再通过下载速率(因为下载速率已经为0了)来决定为哪些 peers 提供上载了。目前采用的解决办法是,优先选择那些从它这里得到更好的上载速率的peers。这样的理由是可以尽可能的利用上载带宽。

4、真实世界的体验
BitTorrent不仅仅早已经实现,而且早已经被广泛的使用,它为许多并发的下载者提供成百兆的文件下载。已知的最大的部署中,同时有超过1000个的下载者。当前的瓶颈(实际还没有达到)看来是tracker的带宽。它(trakcer的带宽)大概占用了带宽总量的千分之一,一些小的协议扩展可能会使它降到万分之一。

参考资料:

[1] E. Adar and B. A. Huberman. Free riding on gnutella. First Monday, 5(10), 2000.
[2] A.-L. Barab´asi. Linked: The New Science of Networks.Perseus Publishing, 2002.
[3] M. Castro, P. Druschel, A.-M. Kermarrec, A. Nandi, A. Rowstron, and A. Singh. Splitstream: High-bandwidth content distribution in cooperative environments. In Proceedings of IPTPS03, Berkeley, USA, Feb. 2003.
[4] P. Maymounkov and D. Mazieres. Kademlia: A peer-to-peer information system based on the xormetric. In Proceedings of IPTPS02, Cambridge, USA, Mar. 2002.
wish.lijt@gmail.comQQ:11242538 call me

TOP

BT种子文件使用了一种叫bencoding的编码方法来保存数据。
bencoding现有四种类型的数据:srings(字符串),integers(整数),lists(列表),dictionaries(字典)
编码规则如下:
strings(字符串)编码为:<字符串长度>:<字符串>
例如: 4:test 表示为字符串"test"
4:例子 表示为字符串“例子”
字符串长度单位为字节
没开始或结束标记

integers(整数)编码为:i<整数>e
开始标记i,结束标记为e
例如: i1234e 表示为整数1234
i-1234e 表示为整数-1234
整数没有大小限制
i0e 表示为整数0
i-0e 为非法
以0开头的为非法如: i01234e 为非法

lists(列表)编码为:l<bencoding编码类型>e
开始标记为l,结束标记为e
列表里可以包含任何bencoding编码类型,包括整数,字符串,列表,字典。
例如: l4:test5abcdee 表示为二个字符串["test","abcde"]

dictionaries(字典)编码为d<bencoding字符串><bencoding编码类型>e
开始标记为d,结束标记为e
关键字必须为bencoding字符串
值可以为任何bencoding编码类型
例如: d3:agei20ee 表示为{"age"=20}
d4:path3:C:\8:filename8:test.txte 表示为{"path"="C:\","filename"="test.txt"}

具体文件结构如下:
全部内容必须都为bencoding编码类型。
整个文件为一个字典结构,包含如下关键字
announce:tracker服务器的URL(字符串)
announce-list(可选):备用tracker服务器列表(列表)
creation date(可选):种子创建的时间,Unix标准时间格式,从1970 1月1日 00:00:00到创建时间的秒数(整数)
comment(可选):备注(字符串)
created by(可选):创建人或创建程序的信息(字符串)
info:一个字典结构,包含文件的主要信息,为分二种情况:单文件结构或多文件结构
单文件结构如下:
         length:文件长度,单位字节(整数)
         md5sum(可选):长32个字符的文件的MD5校验和,BT不使用这个值,只是为了兼容一些程序所保留!(字符串)
         name:文件名(字符串)
         piece length:每个块的大小,单位字节(整数)
         pieces:每个块的20个字节的SHA1 Hash的值(二进制格式)
多文件结构如下:
         files:一个字典结构
                length:文件长度,单位字节(整数)
                md5sum(可选):同单文件结构中相同
                path:文件的路径和名字,是一个列表结构,如\test\test.txt 列表为l4:test8test.txte
         name:最上层的目录名字(字符串)
         piece length:同单文件结构中相同
         pieces:同单文件结构中相同
实例:
用记事本打开一个.torrent可以看来类似如下内容
d8:announce35:http://www.manfen.net:7802/announce13:creation datei1076675108e4:infod6:lengthi17799e4:name62:MICROSOFT.WINDOWS.2000.AND.NT4.SOURCE.CODE-SCENELEADER.torrent12:piece lengthi32768e6:pieces20:?W ?躐?緕排T酆ee

很容易看出
announce=http://www.manfen.net:7802/announce
creation date=1076675108秒(02/13/04 20:25:08)
文件名=MICROSOFT.WINDOWS.2000.AND.NT4.SOURCE.CODE-SCENELEADER.torrent
文件大小=17799字节
文件块大小=32768字节
wish.lijt@gmail.comQQ:11242538 call me

TOP

IdentificationBitTorrent is a peer-to-peer file sharing protocol designed by Bram Cohen. Visit his pages at http://www.bitconjurer.org. BitTorrent is designed to facilitate file transfers among multiple peers across unreliable networks.

PurposeThe purpose of this specification is to document version 1.0 of the BitTorrent protocol specification in detail. Bram's protocol specification page http://www.bitconjurer.org/BitTorrent/protocol.html outlines the protocol in somewhat general terms, and lacks behaviorial detail in some areas. The hope is that this document will become a formal specification, written in clear, unambiguous terms, which can be used as a basis for discussion and implementation in the future.

This document is intended to be maintained and used by the BitTorrent development community. Everyone is invited to contribute to this document, with the understanding that the content here is intended to represent the current protocol, which is already deployed in a number of client implementations.

This is not the place to suggest feature requests. For that, please go to the mailing list.

ScopeThis document applies to the first version (i.e. version 1.0) of the BitTorrent protocol specification. Currently, this applies to the torrent file structure, peer wire protocol, and the Tracker HTTP protocol specifications. As newer revisions of each protocol are defined, they should be specified on their own separate pages, not here.

Related Documentshttp://www.bitconjurer.org/BitTorrent/protocol.html - The official protocol specification.
BitTorrentWishList - A wish list for developers and end users alike.
BitTorrentTrackerExtensions - Describes the various extensions of the Tracker protocol that are in use.

ConventionsIn this document, a number of conventions are used in an attempt to present information in a concise and unambiguous fashion.

peer v/s client: In this document, a peer is any BitTorrent client participating in a download. The client is also a peer, however it is the BitTorrent client that is running on the local machine. Reader of this specification may choose to think of themselves as the client which connects to numerous peers.
piece v/s block: In this document, a piece refers to a portion of the downloaded data that is described in the metainfo file, which can be verified by a SHA1 hash. A block is a portion of data that a client may request from a peer. Two or more blocks make up a whole piece, which may then be verified.
defacto standard: Large blocks of text in italics indicates a practice so common in various client implementations of BitTorrent that it is considered a defacto standard.
In order help others to find recent changes that have been made to this document, please fill out the change log (last section). This should contain a brief (i.e. one-line) entry for each major change that you've made to the document.

bencodingBencoding is a way to specify and organize data in a terse format. It currently supports the following types: strings, integers, lists, and dictionaries.

stringsStrings are encoded as follows: <string length>:<string data>
Note that there is no constant beginning delimiter, and no specific ending delimiter.
Example: 4:spam represents the string "spam"

integersIntegers are encoded as follows: i<integer>e
The initial i and trailing e are beginning and ending delimiters.
Example i3e represents the integer "3"

listsLists are encoded as follows: l<bencoded type>e
The initial l and trailing e are beginning and ending delimiters.

Lists may contain any bencoded type, including integers, strings, dictionaries, and other lists.

Example: l4:spam4:eggse represents the list of two strings: ["spam", "eggs"]

dictionariesDictionaries are encoded as follows: d<bencoded string><bencoded element>e
The initial d and trailing e are the beginning and ending delimiters.

Note that the keys must be bencoded strings. The values may be any bencoded type, including integers, strings, lists, and other dictionaries. Keys must be strings and appear in sorted order (sorted as raw strings, not alphanumerics).

Example: d3:cow3:moo4:spam4:eggse represents the dictionary { "cow" => "moo", "spam" => "eggs" }
Example: d4:spaml1:a1:bee represents the dictionary { "spam" => ["a", "b"] }

Metainfo File StructureAll data in a metainfo file is bencoded. The specification for bencoding is defined above.

The content of a metainfo file (the file ending in ".torrent") is a bencoded dictionary, containing the keys listed below. Keys not marked 'optional' are required fields:

info: a dictionary that describes the file(s) of the torrent. There are two possible forms: one for the case of a 'single-file' torrent with no directory structure, and one for the case of a 'multi-file' torrent, which can contain subdirectory trees.

For the case of the single-file mode, the info dictionary contains the following structure
length: length of the file in bytes (integer)
md5sum: (optional) a 32-character hexadecimal string corresponding to the MD5 sum of the file. This is not used by BitTorrent at all, but it is included by some programs for greater compatibility.
name: the filename of the file (string)
piece length: number of bytes in each piece (integer)
pieces: string consisting of the concatenation of all 20-byte SHA1 hash values, one per piece (raw binary encoded)
For the case of the multi-file mode, the info dictionary contains the following structure
files: a list of dictionaries, one for each file. Each dictionary in this list contains the following keys:

length: length of the file in bytes (integer)
md5sum: (optional) a 32-character hexadecimal string corresponding to the MD5 sum of the file. This is not used by BitTorrent at all, but it is included by some programs for greater compatibility.
path: a list containing one or more string elements that together represent the path and filename. Each element in the list corresponds to either a directory name or (in the case of the final element) the filename. For example, a the file "dir1/dir2/file.ext" would consist of three string elements: "dir1", "dir2", and "file.ext".
name: the name of the top-most directory in the structure -- the directory which contains all of the files listed in the above files list. (string)
piece length: number of bytes in each piece (integer)
pieces: string consisting of the concatenation of all 20-byte SHA1 hash values, one per piece (raw binary encoded)
announce: The announce URL of the tracker (string)
announce-list: (optional) this is an extention to the official specification, which is also backwards compatible. This key is used to implement lists of backup trackers. The full specification can be found at http://home.elp.rr.com/tur/multitracker-spec.txt
creation date: (optional) the creation time of the torrent, in standard Unix epoch format (integer seconds since 1-Jan-1970 00:00:00 UTC)
comment: (optional) free-form textual comments of the author (string)
created by: (optional) name and version of the program used to create the .torrent (string)
Notes
The piece length specifies the nominal piece size, and is usually a power of 2. The piece size is typically chosen based on the total amount of file data in the torrent, constrained by the fact that too small a piece size will result in a large .torrent metadata file, and piece sizes too large cause inefficiency. The general rule of thumb seems to be to pick the smallest piece size that results in a .torrent file no greater than approx. 50 - 75 kB. The most common sizes are 256 kB, 512 kB, and 1 MB. Every piece is of equal length except for the final piece, which is irregular. The number of pieces is thus determined by 'ceil( total length / piece size )'. For the purposes of piece boundaries in the multi-file case, consider the file data as one long continuous stream, composed of the concatenation of each file in the order listed in the files list. The number of pieces and their boundaries are then determined in the same manner as the case of a single file. Pieces may overlap file boundaries.
Each piece has a corresponding SHA1 hash of the data contained within that piece. These hashes are concatenated to form the pieces value in the above info dictionary. Note that this is not a list but rather a single string. The length of the string must be a multiple of 20 bytes.
Tracker HTTP ProtocolThe tracker is an HTTP service which responds to HTTP GET requests. The requests include metrics from clients that help the tracker keep overall statistics about the torrent. The response includes a peer list that helps the client participate in the torrent. The base URL consists of the "announce URL" as defined in the metadata (.torrent) file. The parameters are then added to this URL, using standard CGI methods (i.e. a '?' after the announce URL, followed by 'param=value' sequences separated by '&')

Note that all binary data in the URL (particularly info_hash and peer_id) must be properly escaped. This means any byte not in the set 0-9, a-z, A-Z, and $-_.+!*'(), must be encoded using the "%nn" format, where nn is the hexadecimal value of the byte. (See RFC1738 for details.)

The parameters used in the client->tracker GET request are as follows:

info_hash: 20-byte SHA1 hash of the value of the info key from the Metainfo file. Note that the value will be a bencoded dictionary, given the definition of the info key above.
peer_id: 20-byte string used as a unique ID for the client, generated by the client at startup. This is allowed to be any value, and may be binary data. There are currently no guidelines for generating this peer ID. However, one may rightly presume that it must at least be unique for your local machine, thus should probably incorporate things like process ID and perhaps a timestamp recorded at startup. See peer_id below for common client encodings of this field.
port: The port number that the client is listening on. Ports reserved for BitTorrent are typically 6881-6889. Clients may choose to give up if it cannot establish a port within this range.
uploaded: The total amount uploaded (since the client sent the 'started' event to the tracker) in base ten ASCII.
downloaded: The total amount downloaded (since the client sent the 'started' event to the tracker) in base ten ASCII.
left: The number of bytes this client still has to download, encoded in base ten ascii.
event: If specified, must be one of started, completed, or stopped. If not specified, then this request is one performed at regular intervals.

started: The first request to the tracker must include the event key with the started value.
stopped: Must be sent to the tracker if the client is shutting down gracefully.
completed: Must be sent to the tracker when the download completes. However, must not be sent if the download was already 100% complete when the client started. Presumably, this is to allow the tracker to increment the "completed downloads" metric based soley on this event.
ip: Optional. The true IP address of the client machine, in dotted quad format or rfc3513 defined hexed [wiki]IPv6[/wiki] address. Notes: In general this parameter is not necessary as the address of the client can be determined from the IP address from which the HTTP request came. The parameter is only needed in the case where the IP address that the request came in on is not the IP address of the client. This happens if the client is communicating to the tracker through a proxy (or a transparent web proxy/cache.) It also is necessary when both the client and the tracker are on the same local side of a NAT gateway. The reason for this is that otherwise the tracker would give out the internal (RFC1918) address of the client, which is not routeable. Therefore the client must explicitly state its (external, routeable) IP address to be given out to external peers. Various trackers treat this parameter differently. Some only honor it only if the IP address that the request came in on is in RFC1918 space. Others honor it unconditionally, while others ignore it completely. In case of IPv6 address (e.g.: 2001:db8:1:2::100) it indicates only that client can communicate via IPv6.
numwant: Optional. Number of peers that the client would like to receive from the tracker. This value is permitted to be zero. If omitted, typically defaults to 50 peers.
The tracker responds with "text/plain" document consisting of a bencoded dictionary with the following keys:

failure reason: If present, then no other keys may be present. The value is a human-readable error message as to why the request failed (string).
interval: Interval in seconds that the client should wait between sending regular requests to the tracker
complete: number of peers with the entire file, i.e. seeders (integer)
incomplete: number of non-seeder peers, aka "leechers" (integer)
peers: The value is a list of dictionaries, each with the following keys:

peer id: peer's self-selected ID, as described above for the tracker request (string)
ip: peer's IP address (either IPv6 or IPv4) or DNS name (string)
port: peer's port number (integer)
As mentioned above, the list of peers is length 50 by default. If there are fewer peers in the torrent, then the list will be smaller. Otherwise, the tracker randomly selects peers to include in the response. The tracker may choose to implement a more intelligent mechanism for peer selection when responding to a request. For instance, reporting seeds to other seeders could be avoided.

Clients may send a request to the tracker more often than the specified interval, if an event occurs (i.e. stopped or completed) or if the client needs to learn about more peers. However, it is considered bad practice to "hammer" on a tracker to get multiple peers. If a client wants a large peer list in the response, then it should specify the numwant parameter.

Tracker 'scrape' ConventionBy convention most trackers support another form of request, which queries the state of a given torrent (or all torrents) that the tracker is managing. This is referred to as the "scrape page" because it automates the otherwise tedious process of "screen scraping" the tracker's stats page.

The scrape URL is also a HTTP GET method, similar to the one described above. However the base URL is different. To derive the scrape URL use the following steps: Begin with the announce URL. Find the last '/' in it. If the text immediately following that '/' isn't 'announce' it will be taken as a sign that that tracker doesn't support the scrape convention. If it does, substitute 'scrape' for 'announce' to find the scrape page.

Examples: (announce URL -> scrape URL)

  http://spam.com/announce          -> http://spam.com/scrape
  http://spam.com/x/announce        -> http://spam.com/x/scrape
  http://spam.com/announce.php      -> http://spam.com/scrape.php
  http://spam.com/a                 -> (scrape not supported)
  http://spam.com/announce?x=2%0644 -> http://spam.com/scrape?x=2%0644
  http://spam.com/announce?x=2/4    -> (scrape not supported)
  http://spam.com/x%064announce     -> (scrape not supported)
Note especially that entity unquoting is not to be done. This standard is documented by Bram in the BitTorrent development list archive: http://groups.yahoo.com/group/BitTorrent/message/3275

The scrape URL may be supplimented by the optional parameter info_hash, a 20-byte value as described above. This restricts the tracker's report to that particular torrent. Otherwise stats for all torrents that the tracker is managing are returned. Software authors are strongly encouraged to use the info_hash parameter when at all possible, to reduce the load and bandwidth of the tracker.

The response of this HTTP GET method is a "text/plain" document consisting of a bencoded dictionary, containing the following keys
files: a dictionary containing one key/value pair for each torrent for which there are stats. If info_hash was supplied and was valid, this dictionary will contain a single key/value. Each key consists of a 20-byte binary info_hash value. The value of that key is yet another nested dictionary containing the following:

complete: number of peers with the entire file, i.e. seeders (integer)
downloaded: total number of times the tracker has registered a completion ("event=complete", i.e. a client finished downloading the torrent)
incomplete: number of non-seeder peers, aka "leechers" (integer)
name: (optional) the torrent's internal name, as specified by the "name" file in the info section of the .torrent file
Note that this response has three levels of dictionary nesting. Here's an example:

d5:filesd20:....................d8:completei5e10:downloadedi50e10:incompletei10eeee

Where .................... is the 20 byte info_hash and there are 5 seeders, 10 leechers, and 50 complete downloads.

Peer wire protocol (TCP)OverviewThe peer protocol facilitates the exchange of pieces as described in the metainfo file.

Note here that the original specification also used the term "piece" when describing the peer protocol, but as a different term than "piece" in the metainfo file. For that reason, the term "block" will be used in this specification to describe the data that is exchanged between peers over the wire.

A client must maintain state information for each connection that it has with a remote peer:

choked: Whether or not the remote peer has choked this client. When a peer chokes the client, it is a notification that no requests will be answered until the client is unchoked. The client should not attempt to send requests for blocks, and it should consider all pending (unanswered) requests to be discarded by the remote peer.
interested: Whether or not the remote peer is interested in something this client has to offer. This is a notification that the remote peer will begin requesting blocks when the client unchokes them.
Note that this also implies that the client will also need to keep track of whether or not it is interested in the remote peer, and if it has the remote peer choked or unchoked. So, the real list looks something like this:

am_choking: this client is choking the peer
am_interested: this client is interested in the peer
peer_choking: peer is choking this client
peer_interested: peer is interested in this client
Client connections start out as "choked" and "not interested". In other words:

am_choking = 1
am_interested = 0
peer_choking = 1
peer_interested = 0
A block is downloaded by the client when the client is interested in a peer, and that peer is not choking the client. A block is uploaded by a client when the client is not choking a peer, and that peer is interested in the client.

It is important for the client to keep its peers informed as to whether or not it is interested in them. This state information should be kept up-to-date with each peer even when the client is choked. This will allow peers to know if the client will begin downloading when it is unchoked (and vice-versa).

Data TypesUnless specified otherwise, all integers in the peer wire protocol are encoded as four byte big-endian values. This includes the length prefix on all messages that come after the handshake.

Message flowThe peer wire protocol consists of an initial handshake. After that, peers communicate via an exchange of length-prefixed messages. The length-prefix is an integer as described above.

HandshakeThe handshake is a required message and must be the first message transmitted by the client.

handshake: <pstrlen><pstr><reserved><info_hash><peer_id>

pstrlen: string length of <pstr>, as a single raw byte
pstr: string identifier of the protocol
reserved: eight (8) reserved bytes. Each bit in these bytes can be used to change the behavior of the protocol. An email from Bram suggests that trailing bits should be used first, so that leading bits may be used to change the meaning of trailing bits.
info_hash: 20-byte SHA1 hash of the info key in the metainfo file. This is the same info_hash that is transmitted in tracker requests.
peer_id: 20-byte string used as a unique ID for the client. This is the same peer_id that is transmitted in tracker requests.
In version 1.0 of the BitTorrent protocol, pstrlen=19, and pstr="BitTorrent protocol".

The initiator of a connection is expected to transmit their handshake immediately. The recipient may wait for the initiator's handshake, if it is capable of serving multiple torrents simultaneously (torrents are uniquely identified by their info_hash). However, the recipient must respond as soon as it sees the info_hash part of the handshake. The tracker's NAT-checking feature does not send the peer_id field of the handshake.

If a client receives a handshake with an info_hash that it is not currently serving, then the client must drop the connection.

If the initiator of the connection receives a handshake in which the peer_id does not match the expected peer_id, then the initiator is expected to drop the connection. Note that the initiator presumably received the peer information from the tracker, which includes the peer_id that was registered by the peer. The peer_id from the tracker and in the handshake are expected to match.


peer_idThere are mainly two conventions how to encode client and client version information into the peer_id, Azureus-style and Shadow's-style.

Azureus-style uses the following encoding: '-', two characters for client id, four ascii digits for version number, '-', followed by random numbers.

For example: '-AZ2060-'...

known clients that uses this encoding style are:

'AZ' - Azureus
'BB' - BitBuddy
'CT' - CTorrent
'MT' - MoonlightTorrent
'LT' - libtorrent
'BX' - Bittorrent X
'TS' - Torrentstorm
'TN' - TorrentDotNET
'SS' - SwarmScope
'XT' - XanTorrent
Shadow's style uses the following encoding: one ascii alphanumeric for client identification, three ascii digits for version number, '----', followed by random numbers.

For example: 'S587----'...

known clients that uses this encoding style are:

'S' - Shadow's client
'U' - UPnP NAT Bit Torrent
'T' - BitTornado
'A' - ABC
Bram's client now uses this style... 'M3-4-2--'.

BitComet does something different still. Its peer_id consists of four ASCII characters 'exbc', followed by a null byte, followed by a single ASCII numeric digit, followed by random characters. The digit seems to denote the version of the software, though it appears to have no connection with the real version number. The digit is incremented with each new BitComet release.

Many clients are using all random numbers or 12 zeroes followed by random numbers (like older versions of Bram's client).

MessagesAll of the remaining messages in the protocol take the form of <length prefix><message ID><payload>. The length prefix is a four byte big-endian value. The message ID is a single decimal character. The payload is message dependent.

keep-alive: <len=0000>The keep-alive message is a message with zero bytes, specified with the length prefix set to zero. There is no message ID and no payload.

choke: <len=0001><id=0>The choke message is fixed-length and has no payload.

unchoke: <len=0001><id=1>The unchoke message is fixed-length and has no payload.

interested: <len=0001><id=2>The interested message is fixed-length and has no payload.

not interested: <len=0001><id=3>The not interested message is fixed-length and has no payload.

have: <len=0005><id=4><piece index>The have message is fixed length. The payload is the zero-based index of a piece that has been successfully downloaded.

bitfield: <len=0001+X><id=5><bitfield>The bitfield message may only be sent immediately after the handshaking sequence is completed, and before any other messages are sent. It is optional, and need not be sent if a client has no pieces.

The bitfield message is variable length, where X is the length of the bitfield. The payload is a bitfield representing the pieces that have been successfully downloaded. The high bit in the first byte corresponds to piece index 0. Bits that are cleared indicated a missing piece, and set bits indicate a valid and available piece. Spare bits at the end are set to zero.

A bitfield of the wrong length is considered an error. Clients should drop the connection if they receive bitfields that are not of the correct size, or if the bitfield has any of the spare bits set.

request: <len=0013><id=6><index><begin><length>The request message is fixed length, and is used to request a block. The payload contains the following information
index: integer specifying the zero-based piece index
begin: integer specifying the zero-based byte offset within the piece
length: integer specifying the requested length. This value must not exceed 2^17 bytes, typical values are 2^15 bytes.
The observant reader will note that a block is typically smaller than a piece (which is commonly >= 2^18 bytes). A client should close the connection if it receives a request for more than 2^17 bytes.

piece: <len=0009+X><id=7><index><begin><block>The piece message is variable length, where X is the length of the block. The payload contains the following information
index: integer specifying the zero-based piece index
begin: integer specifying the zero-based byte offset within the piece
block: block of data, which is a subset of the piece specified by index.
cancel: <len=0013><id=8><index><begin><length>The cancel message is fixed length, and is used to cancel block requests. The payload is identical to that of the "request" message. It is typically used during "End Game" (see the Algorithms section below).

AlgorithmsSuper Seeding(This was not part of the original specification)

The super-seed feature in S-5.5 and on is a new seeding algorithm designed to help a torrent initiator with limited bandwidth "pump up" a large torrent, reducing the amount of data it needs to upload in order to spawn new seeds in the torrent.

When a seeding client enters "super-seed mode", it will not act as a standard seed, but masquerades as a normal client with no data. As clients connect, it will then inform them that it received a piece -- a piece that was never sent, or if all pieces were already sent, is very rare. This will induce the client to attempt to download only that piece.

When the client has finished downloading the piece, the seed will not inform it of any other pieces until it has seen the piece it had sent previously present on at least one other client. Until then, the client will not have access to any of the other pieces of the seed, and therefore will not waste the seed's bandwidth.

This method has resulted in much higher seeding efficiencies, by both inducing peers into taking only the rarest data, reducing the amount of redundant data sent, and limiting the amount of data sent to peers which do not contribute to the swarm. Prior to this, a seed might have to upload 150% to 200% of the total size of a torrent before other clients became seeds. However, a large torrent seeded with a single client running in super-seed mode was able to do so after only uploading 105% of the data. This is 150-200% more efficient than when using a standard seed.

Super-seed mode is NOT recommended for general use. While it does assist in the wider distribution of rare data, because it limits the selection of pieces a client can downlad, it also limits the ability of those clients to download data for pieces they have already partially retrieved. Therefore, super-seed mode is only recommended for initial seeding servers.

Why not rename it to e.g. "Initial Seeding Mode" or "Releaser Mode" then?

Piece downloading strategyClients may choose to download pieces in random order.

A better strategy is to download pieces in rarest first order. The client can determine this by keeping the initial bitfield from each peer, and updating it with every have message. Then, the client can download the pieces that appear least frequently in these peer bitfields.

End GameWhen a download is almost complete, there's a tendency for the last few blocks to trickle in slowly. To speed this up, the client sends requests for all of its missing blocks to all of its peers. To keep this from becoming horribly inefficient, the client also sends a cancel to everyone else every time a block arrives.

There is no documented thresholds, recommended percentages, or block counts that could be used as a guide or Recommended Best Practice here.

Choking and Optimistic UnchokingChoking is done for several reasons. TCP congestion control behaves very poorly when sending over many connections at once. Also, choking lets each peer use a tit-for-tat-ish algorithm to ensure that they get a consistent download rate.

The choking algorithm described below is the currently deployed one. It is very important that all new algorithms work well both in a network consisting entirely of themselves and in a network consisting mostly of this one.

There are several criteria a good choking algorithm should meet. It should cap the number of simultaneous uploads for good TCP performance. It should avoid choking and unchoking quickly, known as 'fibrillation'. It should reciprocate to peers who let it download. Finally, it should try out unused connections once in a while to find out if they might be better than the currently used ones, known as optimistic unchoking.

The currently deployed choking algorithm avoids fibrillation by only changing choked peers once every ten seconds.

Reciprocation and number of uploads capping is managed by unchoking the four peers which have the best upload rate and are interested. This maximizes the client's download rate. These four peers are referred to as downloaders, because they are interested in downloading from the client.

Peers which have a better upload rate (as compared to the downloaders) but aren't interested get unchoked. If they become interested, the downloader with the worst upload rate gets choked. If a client has a complete file, it uses its upload rate rather than its download rate to decide which peers to unchoke.

For optimistic unchoking, at any one time there is a single peer which is unchoked regardless of it's upload rate (if interested, it counts as one of the four allowed downloaders). Which peer is optimistically unchoked rotates every 30 seconds. Newly connected peers are three times as likely to start as the current optimistic unchoke as anywhere else in the rotation. This gives them a decent chance of getting a complete piece to upload.

Anti-snubbingOccasionally a BitTorrent peer will be choked by all peers which it was formerly downloading from. In such cases it will usually continue to get poor download rates until the optimistic unchoke finds better peers. To mitigate this problem, when over a minute goes by without getting a single piece from a particular peer, BitTorrent assumes it is "snubbed" by that peer and doesn't upload to it except as an optimistic unchoke. This frequently results in more than one concurrent optimistic unchoke, (an exception to the exactly one optimistic unchoke rule mentioned above), which causes download rates to recover much more quickly when they falter.

Change LogPut your changes below this line, so that the most recent changes appear first. The change log should be purged from time to time. Please preserve the last month's worth of change logs.

frozen at wideopenwest dot com - 2004-10-22

Cleared up possible confusion regarding the "uploaded" and "downloaded" parameters sent to the tracker. The text originally did not indicate whether the values in these parameters reset on each new session with a tracker.
dictionary with has the => dictionary with the
SpookyET - 2004-10-02

Added "complete" and "incomplete" keys to the announce.
SpookyET - 2004-09-28

Changed "piece-section" to "block", which is a more popular term.
WhitSlack (mwhitlock@whitsoftdev.com) -- 2004-09-20

Fixed a typo: numwanted => numwant in paragraph about announce interval.
Added BitComet peer_id encoding style.
Added reference in Tracker HTTP Protocol to peer_id description in Peer wire protocol.
Concur with deltab on breaking this page up.
deltab - 2004-09-06

This page is rather large, so I'd like to see it split up into sections with their own pages; possibly bencoding, metadata file format, tracker/client protocol, peer/peer protocol, peer IDs, and algorithms/strategies. But it would be a big change, so I've not just gone ahead and done it. Any comments, objections?
Daniel Wang (support@btvampire.com) - 2004-08-09

Added BitBuddy client ID.
SpookyET - 2004-06-15

Fixed the sorting of keys in dictionaries specification.
SpookyET - 2004-06-04

Added anti-snubbing specification
Stormblade (stormblade@torrentstorm.com) - 2004-06-02

Added new client ID for Brams client.
SpookyET - 2004-05-29

Removed "Stealth Seeding", because it was an incomplete specification to "super-seeding".
Added the full super-seeding specification.
webmaster@andreas-diem.at - 2004-05-08

Added BitTornado client ID.
Added UPnP NAT Bit Torrent Url
dan_f@blueyonder.co.uk - 2004-05-07

Added new XanTorrent client ID.
agthorr@barsoom.org - 2004-04-06

Described the NAT-checker's short handshake.
Added ?SwarmScope client ID.
agthorr@barsoom.org - 2004-03-23

Fixed description of pstrlen, which previously seemed to suggest that the value is encoded in ASCII. It's raw binary, not decimal.
amand@alrj.org - 2004-03-11

Set length of "have" message to 5 instead of 2, since the piece index is an int, coded on 4 bytes.
flabdablet@yahoo.com - 2004-03-02

s/delimeter/delimiter/g
Stormblade (stormblade@torrentstorm.com) - 2004-01-27

Added Torrentstorm client ID.
Bram - 2004-01-10

Got rid of compression section, because it was unofficial, and a bad idea
removed link to specification two point zero page, since it was nothing of the sort
steveninweston@hotmail.com - 2004-01-01

Added "Extensions" and "Compression"
dpmott@sep.com - 2003-12-30

Added "Related Documents" and "Change Log" section.
Updated "Conventions" section to mention the change log.
Added link to BitTorrentSpecificationTwoPointZero
wish.lijt@gmail.comQQ:11242538 call me

TOP

Tracker服务器源码分析之一:总述

作者:小马哥


日期:2004-5-29





   tracker服务器是BT下载中必须的角色。一个BT client 在下载开始以及下载进行的过程中,要不停的与 tracker 服务器进行通信,以报告自己的信息,并获取其它下载client的信息。这种通信是通过 HTTP 协议进行的,又被称为 tracker  HTTP 协议,它的过程是这样的:


   client 向 tracker 发一个HTTP 的GET请求,并把它自己的信息放在GET的参数中;这个请求的大致意思是:我是xxx(一个唯一的id),我想下载yyy文件,我的ip是aaa,我用的端口是bbb。。。


   tracker 对所有下载者的信息进行维护,当它收到一个请求后,首先把对方的信息记录下来(如果已经记录在案,那么就检查是否需要更新),然后将一部分(并非全部,根据设置的参数已经下载者的请求)参与下载同一个文件(一个tracker服务器可能同时维护多个文件的下载)的下载者的信息返回给对方。


   Client在收到tracker的响应后,就能获取其它下载者的信息,那么它就可以根据这些信息,与其它下载者建立连接,从它们那里下载文件片断。





关于client和tracker之间通信协议的细节,在“BT协议规范”中已经给出,这里不再重复。下面我们具体分析 tracker服务器的实现细节。





从哪里开始?


   要建立一个 tracker服务器,只要运行 bttrack.py 程序就行了,它最少需要一个参数,就是 –dfile,这个参数指定了保存下载信息的文件。Bttrack.py 调用 track.py 中的 track()函数。因此,我们跟踪到 track.py 中去看track() 函数。





Track.py:track()


   这个函数首先对命令行的参数进行检查;然后将这些参数保存到 config 字典中。在BT中所有的工具程序,都有类似的处理方式。





接下来的代码:


   r = RawServer(Event(), config['timeout_check_interval'], config['socket_timeout'])


   t = Tracker(config, r)


   r.bind(config['port'], config['bind'], True)


   r.listen_forever(HTTPHandler(t.get, config['min_time_between_log_flushes']))


   t.save_dfile()





首先是创建一个 RawServer 对象,这是一个服务器对象,它将实现一个网络服务器的一些细节封装起来。不仅tracker服务器用到了 RawServer,我们以后还可以看到,由于每个 client端也需要给其它 client 提供下载服务,因此也同时是一个服务器,client的实现中,也用到了RawServer,这样,RawServer的代码得到了重用。关于 RawServer的详细实现,在后面的小节中进行分析。


接着是创建一个 Tracker对象。


然后让RawServer绑定在指定的端口上(通过命令行传递进来)。


最后,调用 RawServer::listen_forever() 函数,使得服务器投入运行。


最后,在服务器因某些原因结束运行以后,调用 Tracker::save_dfile() 保存下载信息。这样,一旦服务器再次投入运行,可以恢复当前的状态。








其它信息:


1、  BT源码的分布:


把BT的源码展开之后,可以看到有一些python程序,还有一些说明文件等等,此外还有一个BitTorrent目录。这些 python程序,实际是一些小工具,比如制作 metafile的btmakemetafile.py、运行tracker服务器的bttrack.py、运行BT client端的 btdownloadheadless.py 等等。而这些程序中,用到的一些 python 类的实现,都放在子目录 BitTorrent 下面。我们的分析工作,通常是从工具程序入手,比如 bttrack.py,而随着分析的展开,则重点是看 BitTorrenet子目录下的代码。


BT作者 Bram Cohen 在谈到如何开发可维护的代码的一篇文章中(http://www.advogato.org/article/258.html),其中提到的一条就是开发一些小工具以简化工作,我想BT的这种源码结构,也正是作者思想的一种体现吧。





2、  我们看到,python和我们以前接触的 c/c++ 不一样的第一个地方就是它的函数在定义的时候,不用指定参数类型。既然这样,那么,在调用函数的时候,你可以传递任意类型的参数进来。例如这样的函数:


def foo(arg):


   print type(arg)


   


   你可以这样来调用:


   a = 100


   b = “hello world”


   foo(a)


   foo(b)





   输出结果是:


   


   





   这是因为,第一次调用 foo()的时候,传递的是一个整数类型,而第二次调用的时候,传递的是一个字符串类型。





这种参数具有动态类型的特性,是 c/c++等传统的语言是所不具备的。这也是 python 被称为动态语言的一个原因吧。C++的高级特性模板,虽然也使得参数类型可以动态化,但使用起来,远没有python这么简单方便。
wish.lijt@gmail.comQQ:11242538 call me

TOP

Tracker服务器源码分析之二:RawServer类
作者:小马哥


日期:2004-5-30





这篇文章,我们来分析 RawServer 以及一些相关的类。RawServer 类的实现代码,在 BitTorrent 子目录的RawServer.py 中





RawServer 这个类的作用是实现一个网络服务器。关于网络编程的知识,《unix网络编程:卷1》是最经典的书籍,你如果对这块不了解,建议抽时间看看这本书。RawServer 实现的是一种事件多路复用、非阻塞的网络模型。它使用的是 poll() (而不是我们常见的select(),关于 poll和select的比较,也在《unix网络编程:卷1》中有介绍)函数,处理过程大致是这样的:


首先创建一个监听 socket,然后将这个 socket 加入 poll 的事件源;


随后进入服务处理循环,即:





调用 poll() 函数,这个函数会阻塞,直到网络上有某些事件发生或者超时才返回给调用者;


在 poll()返回之后,先检查一下是否有没有处理的任务,如果有,那么先完成这些任务。然后根据事件类型进行处理。


如果是连接请求(监听 socket上的POLLIN事件)到来,它 accept这个请求,如果 accept 成功,那么就和一个 client建立了连接,于是将 accept() 新创建的 socket 加入 poll 的事件源;


如果在已经建立的连接上(连接socket上的POLLIN事件),有数据可读,那么将数据从 client 端读过来,做进一步处理;


如果已经建立的连接已经准备好(连接socket上的POLLOUT事件),可以发送数据,则检查是否有数据需要发送,如果有,那么发送数据给 client 端。





(所以,tracker是一个单进程的服务器,并没有用到线程。)





Bram Cohen 认为软件的可维护性非常重要,使代码易于维护的重要一条就是设计可重用的类,RawServer 在设计的时候,充分考虑到了可重用性,集中表现在两个地方:


1、  将网络 I/O 和数据分析处理分离。


网络服务器的事件多路复用、网络I/O 部分通常是固定不变的,而数据在读取之后,进行分析处理的过程则是可变的。RawServer 将可变的数据处理工作,交给另外一个抽象的类 Handler (实际上并没有这么一个类)来处理。比如,在 tracker 服务器的实现中,具体使用的就是 HTTPHandler 类,而在 以后将要分析的 BT client 实现代码中,用到的具体的Handler 是 Encoder 类。





2、  采用任务队列来抽象出任务处理的过程。


RawServer维护了一个任务队列 unscheduled_tasks(实际是一个二元组的list,二元组的第一项是一个函数,第二项是超时时间)。在初始化的时候,首先向这个队列中加入一个任务:scan_for_timeouts(),这样,每隔一段时间,服务器就会去检查一下是否有连接超时。如果有其它





RawServer的成员函数中,对外暴露的有:





u       __init__:(初始化函数)





u       add_task():


   在任务列表中增加一项任务(一个任务是一个函数以及一个指定的超时时间的组合)





u       bind():


   首先创建一个socket,然后设置socket的属性: SO_REUSEADDR和IP_TOS,,这两个属性的具体含义请参考《unix网络编程:卷1》,另外还将 socket 设置为非阻塞的。相对于阻塞的 socket来说,非阻塞的 socket 在网络 I/O 性能上要提高许多,但是与此同时,编程的复杂度也要提高一些。象 tracker这种可能同时要处理成千上万个并发连接的服务器,只能采用非阻塞的socket。


   然后将该 socket和指定ip已经端口绑定;


   最后把这个socket 加入 poll的事件源。





u       start_connection():


   对外主动建立一个连接,这个函数在处理NAT穿越的时候用到了,我们后面分析到 NAT穿越的时候,再具体讲解。





u       listen_forever():


   这个函数的功能就是实现了我在前面描述的网络服务器的处理过程。我们看到,它唯一的参数是handler,handler的作用就是封装了对数据的具体处理。


listen_forever()把对网络事件的处理过程,交给了 handle_events()。





其它函数,包括handle_events(),都是内部函数(也就是外部不会直接来调用这些函数)。Python没有c++那样 public、protected、private 这样的保护机制,python类的内部函数命名的惯例是以下划线开始,例如 RawServer 中的 _close_dead()等。





u       handle_events():


事件处理过程,主要是根据三种不同的网络事件分别处理,一是连接事件,二是读事件、三是写事件。





if sock == self.server.fileno()





这段代码判断发生事件的socket是否是监听 socket,如果是,那么说明是连接事件。


连接事件的处理:


通过 accept 来接受连接,并将新建立的 socket 设置为非阻塞。


判断当前连接数是否已经达到了最大值(为了限制并发连接的数目,在初始化 RawServer的时候,需要指定最大连接数目),如果已经达到最大值,那么关闭这个新建的连接。


否则,根据新的 socket 创建一个 SingleSocket 对象,(SingleSocket 封装了对 socket的操作。)将这个对象加入内部的列表single_sockets中,以备后用。


将这个新 socket加入 poll 的事件源


最后,调用 Handler 的external_connection_made() 函数,关于这个函数,在后面分析 HTTPHandler 时再讨论。





if (event & POLLIN) != 0:


这段代码判断是否是读事件


读事件的处理:


首先刷新一下连接的最后更新时间 (last_hit)。


然后读取数据;


如果什么也没读到,那么说明连接被关闭了(在网络编程中,如果一个连接正常的被关闭,那么,也会触发读事件,只不过什么也读不到)


否则,调用 Handler的 data_came_in() 函数来处理读到的数据。





if (event & POLLOUT) != 0 and s.socket is not None and not s.is_flushed():


这段代码判断是否是写事件,而且确实有数据需要发送。在一个连接可以写的时候,就会发生写事件。


写事件的处理:


实际代码是在 SingleSocket的 try_write()函数中。


在一个非阻塞的连接上发送指定大小的数据,很可能在一次发送过程中,数据没有被完全发送出去(只发送了一部分)就返回了,所以,每次 write之后,必须判断是否完全发送了数据。如果没有发送完,那么下次有读事件的时候,还得回来继续发送未完得数据。这也是这个函数叫做 try_write 的原因吧。


try_write() 在最后,要重新设置 poll 的事件源。如果数据全部发送完毕了,那么只需要监听读事件(POLLIN)否则,既要监听读事件,也要监听写事件(POLLOUT),这样,一旦连接变的可写,可以继续将剩下的数据发送出去。





u       scan_for_timeouts():


任务处理函数,它首先把自身加入未处理任务队列中,这样,经过一段时间,可以保证这个函数再次被调用,从而达到周期性调用的效果。


它检查每个连接是否超过指定时间没有被刷新,如果是,则该连接可能已经僵死,那么它关闭这个连接。





u       pop_unscheduled():


从任务列表中弹出一个未处理的任务。


  


与 RawServer 配合使用的是 SingleSocket 类,这是一个辅助类,主要目的是封装对 socket的处理吧。包括数据的发送,都交给它来处理了。这个类比较简单,大家可以自己去看,我就不罗嗦了。








以上是对 RasServer 的具体实现的一个分析,可能读者看的还是晕晕糊糊,没办法,还是必须自己去看源代码,然后在遇到问题的时候,回头再来看这篇文章,才会有帮助。如果不亲自看源码,终究是纸上谈兵。





我们再来小结一下。


RawServer 封装了网络服务器的实现细节,它实现了一种事件多路处理、非阻塞的网络模型。它主要负责建立新的连接,从网络读取和发送数据,而对读到的数据的具体处理工作,交给 Handler 类来处理,从而把网络I/O和数据处理分离开来,使得 RawServer可以重用。Handler 类是在调用 listen_forever() 的时候,由调用者传递进来的,具体到 tracker服务器,就是HTTPHandler。有了 RawServer,tracker 就可以作为一个网络服务器运行了。


下一节,我们开始分析具体实现 tracker HTTP 协议处理的 HTTPHandler类和Tracker类。
wish.lijt@gmail.comQQ:11242538 call me

TOP

返回列表