Protocol specification

From Bitmessage Wiki
Revision as of 19:56, 19 August 2013 by ISibboI (talk | contribs) (→‎Encrypted payload: Added some information to the Encrypted payload)
Jump to navigation Jump to search

Common standards

Hashes

Most of the time SHA-512 hashes are used, however RIPEMD-160 is also used when creating an address.

A double-round of SHA-512 is used for the Proof Of Work. Example of double-SHA-512 encoding of string "hello":

hello
9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca72323c3d99ba5c11d7c7acc6e14b8c5da0c4663475c2e5c3adef46f73bcdec043(first round of sha-512)
0592a10584ffabf96539f3d780d776828c67da1ab5b169e9e8aed838aaecc9ed36d49ff1423c55f019e050c66c6324f53588be88894fef4dcffdb74b98e2b200(second round of sha-512)

For Bitmessage addresses (RIPEMD-160) this would give:

hello
9b71d224bd62f3785d96d46ad3ea3d73319bfbc2890caadae2dff72519673ca72323c3d99ba5c11d7c7acc6e14b8c5da0c4663475c2e5c3adef46f73bcdec043(first round is sha-256)
79a324faeebcbf9849f310545ed531556882487e (with ripemd-160)

Common structures

All integers are encoded in big endian. (This is different from Bitcoin).

Message structure

Field Size Description Data type Comments
4 magic uint32_t Magic value indicating message origin network, and used to seek to next message when stream state is unknown
12 command char[12] ASCII string identifying the packet content, NULL padded (non-NULL padding results in packet rejected)
4 length uint32_t Length of payload in number of bytes
4 checksum uint32_t First 4 bytes of sha512(payload)
? payload uchar[] The actual data, a message or an object

Known magic values:

Magic value Sent over wire as
0xE9BEB4D9 E9 BE B4 D9

Variable length integer

Integer can be encoded depending on the represented value to save space. Variable length integers always precede an array/vector of a type of data that may vary in length.

Value Storage length Format
< 0xfd 1 uint8_t
<= 0xffff 3 0xfd followed by the length as uint16_t
<= 0xffffffff 5 0xfe followed by the length as uint32_t
- 9 0xff followed by the length as uint64_t

Variable length string

Variable length string can be stored using a variable length integer followed by the string itself.

Field Size Description Data type Comments
1+ length var_int Length of the string
? string char[] The string itself (can be empty)

Variable length list of integers

n integers can be stored using n+1 variable length integers where the first var_int equals n.

Field Size Description Data type Comments
1+ count var_int Number of var_ints below
1+ var_int The first value stored
1+ var_int The second value stored...
1+ var_int etc...

Network address

When a network address is needed somewhere, this structure is used. This protocol and structure supports IPv6, but note that the original client currently only supports IPv4 networking. Network addresses are not prefixed with a timestamp or stream in the version message.

Field Size Description Data type Comments
4 (or 8) time uint32 the Time. Protocol version 1 clients use 4 byte time while protocol version 2 clients use 8 byte time.
4 stream uint32 Stream number for this node
8 services uint64_t same service(s) listed in version
16 IPv6/4 char[16] IPv6 address. The original client only supports IPv4 and only reads the last 4 bytes to get the IPv4 address. However, the IPv4 address is written into the message as a 16 byte IPv4-mapped IPv6 address

(12 bytes 00 00 00 00 00 00 00 00 00 00 FF FF, followed by the 4 bytes of the IPv4 address).

2 port uint16_t port number

Inventory Vectors

Inventory vectors are used for notifying other nodes about objects they have or data which is being requested. Two rounds of SHA-512 are used, resulting in a 64 byte hash. Only the first 32 bytes are used; the later 32 bytes are ignored.

Inventory vectors consist of the following data format:

Field Size Description Data type Comments
32 hash char[32] Hash of the object

Encrypted payload

Bitmessage uses ECIES to encrypt its messages. For more information, see: Encryption

Field Size Description Data type Comments
16 IV uchar[] Initialization Vector used for AES-256-CBC
2 uint16_t Curve type Elliptic Curve type 0x02CA (714)
2 uint16_t X length Length of X component of public key R
X length uchar[] X X component of public key R
2 uint16_t Y length Length of Y component of public key R
Y length uchar[] Y Y component of public key R
? encrypted uchar[] Cipher text
32 MAC uchar[] HMACSHA256 Message Authentication Code

Unencrypted Message Data

Field Size Description Data type Comments
1+ msg_version var_int Message format version
1+ address_version var_int Sender's address version number. This is needed in order to calculate the sender's address to show in the UI, and also to allow for forwards compatible changes to the public-key data included below.
1+ stream var_int Sender's stream number
4 behavior bitfield uint32_t A bitfield of optional behaviors and features that can be expected from the node with this pubkey included in this msg message (the sender's pubkey).
64 public signing key uchar[] The ECC public key used for signing (uncompressed format; normally prepended with \x04 )
64 public encryption key uchar[] The ECC public key used for encryption (uncompressed format; normally prepended with \x04 )
1+ nonce_trials_per_byte var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is the average number of nonce trials a node will have to perform to meet the Proof of Work requirement. 320 is the network minimum so any lower values will be automatically raised to 320. This field is new and is only included when the address_version >= 3.
1+ extra_bytes var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is added to the data length to make sending small messages more difficult. 14000 is the network minimum so any lower values will be automatically raised to 14000. This field is new and is only included when the address_version >= 3.
20 destination ripe uchar[] The ripe hash of the public key of the receiver of the message
1+ encoding var_int Message Encoding type
1+ message_length var_int Message Length
message_length message uchar[] The message.
1+ ack_length var_int Length of the acknowledgement data
ack_length ack_data uchar[] The acknowledgement data to be transmitted. This takes the form of a Bitmessage protocol message, like another msg message. The POW therein must already be completed.
1+ sig_length var_int Length of the signature
sig_length signature uchar[] The ECDSA signature which covers everything from the msg_version to the ack_data.

Message Encodings

Value Name Description
0 IGNORE Any data with this number may be ignored. The sending node might simply be sharing its public key with you.
1 TRIVIAL UTF-8. No 'Subject' or 'Body' sections. Useful for simple strings of data, like URIs or magnet links.
2 SIMPLE UTF-8. Uses 'Subject' and 'Body' sections. No MIME is used.

messageToTransmit = 'Subject:' + subject + '\n' + 'Body:' + message

Further values for the message encodings can be decided upon by the community. Any MIME or MIME-like encoding format, should they be used, should make use of Bitmessage's 8-bit bytes.

Pubkey bitfield features

Bit Name Description
0 undefined The most significant bit at the beginning of the structure. Undefined
1 undefined The next most significant bit. Undefined
... ... ...
30 include_destination Receiving node expects that the RIPE hash encoded in their address preceedes the encrypted message data of msg messages bound for them.
31 does_ack If true, the receiving node does send acknowledgements (rather than dropping them).

Message types

version

When a node creates an outgoing connection, it will immediately advertise its version. The remote node will respond with its version. No futher communication is possible until both peers have exchanged their version.

Payload:

Field Size Description Data type Comments
4 version int32_t Identifies protocol version being used by the node
8 services uint64_t bitfield of features to be enabled for this connection
8 timestamp int64_t standard UNIX timestamp in seconds
26 addr_recv net_addr The network address of the node receiving this message (not including the time or stream number)
26 addr_from net_addr The network address of the node emitting this message (not including the time or stream number and the ip itself is ignored by the receiver)
8 nonce uint64_t Random nonce used to detect connections to self.
1+ user_agent var_str User Agent (0x00 if string is 0 bytes long)
1+ stream numbers var_int_list The stream numbers that the emitting node is interested in.

A "verack" packet shall be sent if the version packet was accepted. Once you have sent and received a verack messages with the remote node, send an addr message advertising up to 1000 peers of which you are aware, and one or more inv messages advertising all of the valid objects of which you are aware.

The following services are currently assigned:

Value Name Description
1 NODE_NETWORK This is a normal network node.

verack

The verack message is sent in reply to version. This message consists of only a message header with the command string "verack".

addr

Provide information on known nodes of the network. Non-advertised nodes should be forgotten after typically 3 hours

Payload:

Field Size Description Data type Comments
1+ count var_int Number of address entries (max: 1000)
34x? addr_list net_addr Address of other nodes on the network.

inv

Allows a node to advertise its knowledge of one or more objects. Payload (maximum payload length: 50000 items):

Field Size Description Data type Comments
? count var_int Number of inventory entries
32x? inventory inv_vect[] Inventory vectors

getdata

getdata is used in response to an inv message to retrieve the content of a specific object after filtering known elements.

Payload (maximum payload length: 50000 entries):

Field Size Description Data type Comments
? count var_int Number of inventory entries
32x? inventory inv_vect[] Inventory vectors

Object types

Objects are a subset of network messages. They are shared throughout a stream. A client should advertise objects that are not older than 2.5 days. To be a valid object, the Proof Of Work has to be done.

getpubkey

When a node has the hash of a public key (from an address) but not the public key itself, it must send out a request for the public key.

Field Size Description Data type Comments
8 POW nonce uint64_t Random nonce used for the Proof Of Work
4 (or 8) time uint32_t The time that this message was generated and broadcast. We are transitioning to 8 byte time.
1+ address version var_int The address' version
1+ stream number var_int The address' stream number
20 pub key hash uchar[] The ripemd hash of the public key

pubkey

A version 2 public key. This is still in use and supported by current clients but new v2 addresses are not generated by clients.

Field Size Description Data type Comments
8 POW nonce uint64_t Random nonce used for the Proof Of Work
4 (or 8) time uint32_t The time that this message was generated and broadcast. We are transitioning to 8 byte time.
1+ address version var_int The address' version which is set to 2.
1+ stream number var_int The address' stream number
4 behavior bitfield uint32_t A bitfield of optional behaviors and features that can be expected from the node receiving the message.
64 public signing key uchar[] The ECC public key used for signing (uncompressed format; normally prepended with \x04 )
64 public encryption key uchar[] The ECC public key used for encryption (uncompressed format; normally prepended with \x04 )

A version 3 pubkey

Field Size Description Data type Comments
8 POW nonce uint64_t Random nonce used for the Proof Of Work
4 (or 8) time uint32_t The time that this message was generated and broadcast. We are transitioning to 8 byte time.
1+ address version var_int The address' version which is set to 3.
1+ stream number var_int The address' stream number
4 behavior bitfield uint32_t A bitfield of optional behaviors and features that can be expected from the node receiving the message.
64 public signing key uchar[] The ECC public key used for signing (uncompressed format; normally prepended with \x04 )
64 public encryption key uchar[] The ECC public key used for encryption (uncompressed format; normally prepended with \x04 )
1+ nonce_trials_per_byte var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is the average number of nonce trials a node will have to perform to meet the Proof of Work requirement. 320 is the network minimum so any lower values will be automatically raised to 320.
1+ extra_bytes var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is added to the data length to make sending small messages more difficult. 14000 is the network minimum so any lower values will be automatically raised to 14000.
1+ sig_length var_int Length of the signature
sig_length signature uchar[] The ECDSA signature which covers everything from the time to the extra_bytes.

msg

Used for person-to-person messages.

Field Size Description Data type Comments
8 POW nonce uint64_t Random nonce used for the Proof Of Work
4 (or 8) time uint32_t The time that this message was generated and broadcast. We are transitioning to 8 byte time.
1+ streamNumber var_int The stream number of the destination address.
? encrypted uchar[] Encrypted data. See Encrypted payload. See also Unencrypted Message Data Format

broadcast

Version 1 broadcast messages are sent in-the-clear. Version 2 are encrypted. Users who are subscribed to the sending address will see the message appear in their inbox.

Version 1 broadcast format:

Field Size Description Data type Comments
8 POW nonce uint64_t The Proof Of Work nonce
4 (or 8) time uint32_t The time that the message was broadcast. We are transitioning to 8 byte time.
1+ broadcast version var_int The version number of this broadcast protocol message which is equal to 1 in this case.
1+ address version var_int The sender's address version
1+ stream number var_int The sender's stream number
4 behavior bitfield uint32_t A bitfield of optional behaviors and features that can be expected from the owner of this pubkey.
64 public signing key uchar[] The ECC public key used for signing (uncompressed format; normally prepended with \x04 )
64 public encryption key uchar[] The ECC public key used for encryption (uncompressed format; normally prepended with \x04 )
20 address hash uchar[] The sender's address hash. This is included so that nodes can more cheaply detect whether this is a broadcast message for which they are listening, although it must be verified with the public key above.
1+ encoding var_int The encoding type of the message
1+ messageLength var_int The message length in bytes
messageLength message uchar[] The message
1+ sig_length var_int Length of the signature
sig_length signature uchar[] The signature which covers everything from the broadcast version down through the message.

Version 2 broadcasts:

Field Size Description Data type Comments
8 POW nonce uint64_t The Proof Of Work nonce
4 (or 8) time uint32_t The time that the message was broadcast. We are transitioning to 8 byte time.
1+ broadcast version var_int The version number of this broadcast protocol message which is equal to 2 in this case.
1+ stream number var_int The sender's stream number
? encrypted uchar[] Encrypted broadcast data. See Encrypted payload. See also Unencrypted Broadcast Data Format

Unencrypted data format:

Field Size Description Data type Comments
1+ broadcast version var_int The version number of this broadcast protocol message which is equal to 2 in this case. This is included here so that it can be signed.
1+ address version var_int The sender's address version
1+ stream number var_int The sender's stream number
4 behavior bitfield uint32_t A bitfield of optional behaviors and features that can be expected from the owner of this pubkey.
64 public signing key uchar[] The ECC public key used for signing (uncompressed format; normally prepended with \x04 )
64 public encryption key uchar[] The ECC public key used for encryption (uncompressed format; normally prepended with \x04 )
1+ nonce_trials_per_byte var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is the average number of nonce trials a node will have to perform to meet the Proof of Work requirement. 320 is the network minimum so any lower values will be automatically raised to 320. This field is new and is only included when the address_version >= 3.
1+ extra_bytes var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is added to the data length to make sending small messages more difficult. 14000 is the network minimum so any lower values will be automatically raised to 14000. This field is new and is only included when the address_version >= 3.
1+ encoding var_int The encoding type of the message
1+ messageLength var_int The message length in bytes
messageLength message uchar[] The message
1+ sig_length var_int Length of the signature
sig_length signature uchar[] The signature which covers everything from the broadcast version down through the message.

Encrypted payload

Bitmessage uses the Elliptic Curve Integrated Encryption Scheme (ECIES)[1] to encrypt the payload of the Message and Broadcast objects.

The scheme uses Elliptic Curve Diffie-Hellman (ECDH)[2] to generate a shared secret used to generate the encryption parameters for Advanced Encryption Standard with 256bit key and Cipher-Block Chaining (AES-256-CBC)[3]. The encrypted data will be padded to a 16 byte boundary in accordance to PKCS7[4]. This means that the data is padded with N bytes of value N.

The Key Derivation Function (KDF)[5] used to generate the key material for AES is SHA512[6]. The Message Authentication Code (MAC) scheme used is HMACSHA256[7].

Format

Field Size Description Data type Comments
16 IV uchar[] Initialization Vector used for AES-256-CBC
2 uint16_t Curve type Elliptic Curve type 0x02CA (714)
2 uint16_t X length Length of X component of public key R
X length uchar[] X X component of public key R
2 uint16_t Y length Length of Y component of public key R
Y length uchar[] Y Y component of public key R
? encrypted uchar[] Cipher text
32 MAC uchar[] HMACSHA256 Message Authentication Code

In order to reconstitute a usable (65 byte) public key (starting with 0x04), the X and Y components need to be expanded by prepending them with 0x00 bytes until the individual component lengths are 32 bytes.


Encryption

  1. The destination public key is called K.
  2. Generate 16 random bytes using a secure random number generator. Call them IV.
  3. Generate a new random EC key pair with private key called r and public key called R.
  4. Do an EC point multiply with public key K and private key r. This gives you public key P.
  5. Use the X component of public key P and calculate the SHA512 hash H.
  6. The first 32 bytes of H are called key_e and the last 32 bytes are called key_m.
  7. Pad the input text to a multiple of 16 bytes, in accordance to PKCS7.
  8. Encrypt the data with AES-256-CBC, using IV as initialization vector, key_e as encryption key and the padded input text as payload. Call the output cipher text.
  9. Calculate a 32 byte MAC with HMACSHA256, using key_m as salt and cipher text as data. Call the output MAC.

The resulting data is: IV + R + cipher text + MAC

Decryption

  1. The private key used to decrypt is called k.
  2. Do an EC point multiply with private key k and public key R. This gives you public key P.
  3. Use the X component of public key P and calculate the SHA512 hash H.
  4. The first 32 bytes of H are called key_e and the last 32 bytes are called key_m.
  5. Calculate MAC' with HMACSHA256, using key_m as salt and cipher text as data.
  6. Compare MAC with MAC'. If not equal, decryption will fail.
  7. Decrypt the cipher text with AES-256-CBC, using IV as initialization vector, key_e as decryption key and the cipher text as payload. The output is the padded input text.

Example

Public key K:

Data Comments
04 09 d4 e5  c0 ab 3d 25
fe 04 8c 64  c9 da 1a 24
2c 7f 19 41  7e 95 17 cd
26 69 50 d7  2c 75 57 13
58 5c 61 78  e9 7f e0 92
fc 89 7c 9a  1f 17 20 d5
77 0a e8 ea  ad 2f a8 fc
bd 08 e9 32  4a 5d de 18
57
Public key, 0x04 prefix, then 32 bytes X and 32 bytes Y.


Initialization Vector IV:

Data Comments
bd db 7c 28  29 b0 80 38
75 30 84 a2  f3 99 16 81
16 bytes generated with a secure random number generator.

Randomly generated key pair with private key r and public key R:

Data Comments
5b e6 fa cd  94 1b 76 e9
d3 ea d0 30  29 fb db 6b
6e 08 09 29  3f 7f b1 97
d0 c5 1f 84  e9 6b 8b a4
Private key r
04 02 93 21  3d cf 13 88
b6 1c 2a e5  cf 80 fe e6
ff ff c0 49  a2 f9 fe 73
65 fe 38 67  81 3c a8 12
92 df 94 68  6c 6a fb 56
5a c6 14 9b  15 3d 61 b3
b2 87 ee 2c  7f 99 7c 14
23 87 96 c1  2b 43 a3 86
5a
Public key R

Derived public key P (point multiply r with K):

Data Comments
04 0d b8 e3  ad 8c 0c d7
3f a2 b3 46  71 b7 b2 47
72 9b 10 11  41 57 9d 19
9e 0d c0 bd  02 4e ae fd
89 ca c8 f5  28 dc 90 b6
68 11 ab ac  51 7d 74 97
be 52 92 93  12 29 be 0b
74 3e 05 03  f4 43 c3 d2
96
Public key P
0d b8 e3 ad  8c 0c d7 3f
a2 b3 46 71  b7 b2 47 72
9b 10 11 41  57 9d 19 9e
0d c0 bd 02  4e ae fd 89
X component of public key P

SHA512 of public key P X component (H):

Data Comments
17 05 43 82  82 67 86 71
05 26 3d 48  28 ef ff 82
d9 d5 9c bf  08 74 3b 69
6b cc 5d 69  fa 18 97 b4
First 32 bytes of H called key_e
f8 3f 1e 9c  c5 d6 b8 44
8d 39 dc 6a  9d 5f 5b 7f
46 0e 4a 78  e9 28 6e e8
d9 1c e1 66  0a 53 ea cd
Last 32 bytes of H called key_m

Padded input:

Data Comments
54 68 65 20  71 75 69 63
6b 20 62 72  6f 77 6e 20
66 6f 78 20  6a 75 6d 70
73 20 6f 76  65 72 20 74
68 65 20 6c  61 7a 79 20
64 6f 67 2e  04 04 04 04
The quick brown fox jumps over the lazy dog.0x04,0x04,0x04,0x04

Cipher text:

Data Comments
64 20 3d 5b  24 68 8e 25
47 bb a3 45  fa 13 9a 5a
1d 96 22 20  d4 d4 8a 0c
f3 b1 57 2c  0d 95 b6 16
43 a6 f9 a0  d7 5a f7 ea
cc 1b d9 57  14 7b f7 23
3 blocks of 16 bytes of encrypted data.

MAC:

Data Comments
4c 08 ac 6c  93 c7 37 7b
ac 5a 2e 87  3d d3 51 1b
12 7a ff 6d  0d 16 38 cd
ae 49 89 c4  d2 fe 7d e1
HMACSHA256 with key_m as salt and cipher text as input.

Resulting encrypted data:

Data Comments
bd db 7c 28  29 b0 80 38
75 30 84 a2  f3 99 16 81
IV
02 ca
Curve Type
00 20
X Length
02 93 21 3d  cf 13 88 b6
1c 2a e5 cf  80 fe e6 ff
ff c0 49 a2  f9 fe 73 65
fe 38 67 81  3c a8 12 92
X
00 20
Y Length
df 94 68 6c  6a fb 56 5a
c6 14 9b 15  3d 61 b3 b2
87 ee 2c 7f  99 7c 14 23
87 96 c1 2b  43 a3 86 5a
Y
64 20 3d 5b  24 68 8e 25
47 bb a3 45  fa 13 9a 5a
1d 96 22 20  d4 d4 8a 0c
f3 b1 57 2c  0d 95 b6 16
43 a6 f9 a0  d7 5a f7 ea
cc 1b d9 57  14 7b f7 23
Cipher text
4c 08 ac 6c  93 c7 37 7b
ac 5a 2e 87  3d d3 51 1b
12 7a ff 6d  0d 16 38 cd
ae 49 89 c4  d2 fe 7d e1
MAC


Private key k:

Data Comments
02 ba 27 44  e6 5c cd 7b
19 54 b0 a3  3b 80 d7 5e
16 ca b4 7f  2b 33 1f f0
b6 d1 84 b7  19 83 da 85
Private key k used to decrypt the above encrypted data.