Difference between revisions of "Scalability through Prefix Filtering"

From Bitmessage Wiki
Jump to navigation Jump to search
(Added some more content)
(Added some more content)
Line 168: Line 168:
  
 
===Upper Stream Desertion===  
 
===Upper Stream Desertion===  
If the object traffic in the Bitmessage network continues to grow, eventually a point will be reached when no single node can process all of it. At this point, any nodes that started in stream 1 will have to move to stream 2 or 3, or a even lower stream. This will mean that there will be no nodes that XXX
+
If the object traffic in the Bitmessage network continues to grow, eventually a point will be reached where no single node can process all of it. At this point, any nodes that started in stream 1 will have to move to stream 2 or 3, or an even lower stream. This will mean that there will be no nodes in stream 1. This process of 'stream desertion' will continue as long as the object traffic continues to grow. However since both nodes and addresses can move to different streams, this should not present a problem for the functioning of the system.
  
===XXX===  
+
===Node prefix length examples===  
XXX
+
* A prefix length value of 0 would mean that the node will deal with 100% of the total network object traffic.
 +
* A prefix length value of 5 would mean that the node will deal with 12.5% of the total network object traffic.
 +
* A prefix length value of 64 would mean that the node will only deal with objects with a prefix nonce that exactly matches the nodes's stream number (which will be the same as its prefix value, since all 64 bits are used).
  
 
===XXX===  
 
===XXX===  

Revision as of 16:25, 2 February 2015

Introduction

This page describes a proposal for a way to make Bitmessage scalable.

NOTE: This proposal is not yet complete, as some aspects of proposed system are not yet resolved. Suggestions and contributions are welcome.

Summary of the proposal

  • Each Bitmessage address has a 'prefix' and a 'prefix length'. These values determine the balance between anonymity and efficiency that the owner of the address will have when retrieving messages from the network.
  • Each node in the Bitmessage network has a 'prefix' and a 'prefix length'. These values determine what part and what proportion of the total network traffic the node will handle.
  • Groups of nodes form overlapping 'streams' based on their prefix values.
  • As the network grows or shrinks, nodes move to a higher or lower stream in order to handle a greater or smaller proportion of the network object traffic.
  • As the network grows or shrinks, addresses move to a higher or lower stream in order to maintain the balance between anonymity and efficiency desired by the address's owner.
  • Nodes maintain long-lived connections to nodes in their own stream and 'nearby' streams that are 'nearby' in the stream hierarchy, but also temporarily connect directly to nodes in any stream when necessary.
  • Each Bitmessage object has a prefix nonce. This value determines the object's destination stream.
  • Objects propagate in their destination stream and lower streams of the same branch.
  • Nodes process objects in their own stream and lower streams of the same branch.


Reasoning behind the proposal

There are three basic possible approaches to making Bitmessage scalable (credit to Dokument for this summary):

Nothing (or everyone gets everything)

  • Everyone user gets every message.
  • Massive bandwidth and disk space and processing usage, eventually becomes completely unsustainable.
  • Most private.

Streams

  • Take the above and split it into pieces.
  • Still potential for lots of bandwidth/disk space/processing usage.
  • There are problems with binding addresses to one stream, and there are problems with not binding addresses to streams. Both sets affect privacy.

Scaling without streams

  • The same as the first method except only some messages are saved (once the network grows beyond a certain point).
  • Requires two part messages.
  • Requires a lot of thought and processes to effectively hide receiving a message.

This proposal outlines a method for implementing streams that avoids or reduces many of the difficulties with previous stream proposals.

Proposed changes

Prefixes

XXX

Rather than having 1 stream number to start with and gradually increasing the number of streams as the network grows, this system makes 18 quintillion 'streams' (prefix values) available immediately.

XXX

Address

The prefix value of each Bitmessage address is the first 64 bits of the ripe hash encoded within the address. The prefix of an address never changes.

The prefix length value of each address is contained within the address's pubkey, as below. The prefix length value of an address determines how many bits of the address's prefix should be used. Increasing the prefix length of an address moves that address to a lower stream. Increasing the prefix length of an address moves it to a higher stream. This can be accomplished by publishing a new pubkey which contains the updated prefix length value.

If non-hashed addresses are added to the Bitmessage protocol, their prefix length value will have to be encoded within the address string itself, as non-hashed addresses do not have pubkeys. This would prevent non-hashed addresses from moving between streams. Users of non-hashed addresses would have create new addresses instead. This would undoubtedly be less convenient, but is consistent with the overall profile of non-hashed addresses - gaining security and anonymity at the expense of convenience and efficiency.

Object

Under this proposal, Bitmessage objects would be composed as follows. This can be compared to the current specification found at https://bitmessage.org/wiki/Protocol_specification#object.

Field Size Description Data type Comments
8 nonce uint64_t

Random nonce used for the Proof Of Work

8 expiresTime uint64_t

The "end of life" time of this object (be aware, in version 2 of the protocol this was the generation time). Objects shall be shared with peers until its end-of-life time has been reached. The node should store the inventory vector of that object for some extra period of time to avoid reloading it from another node with a small time delay. The time may be no further than 28 days + 3 hours in the future.

4 objectType uint32_t

Four values are currently defined: 0-"getpubkey", 1-"pubkey", 2-"msg", 3-"broadcast". All other values are reserved. Nodes should relay objects even if they use an undefined object type.

1+ version var_int The object's version. Note that msg objects won't contain a version until Sun, 16 Nov 2014 22:00:00 GMT.
8 prefixNonce uint64_t The object's prefix nonce. This determines which streams the object will propagate in.
? objectPayload uchar[]

This field varies depending on the object type; see below.


Pubkey

Under this proposal, the encrypted part of a pubkey would be composed as follows. This can be compared to the current specification found at https://bitmessage.org/wiki/Protocol_specification#pubkey.

Field Size Description Data type Comments
4 behavior bitfield uint32_t A bitfield of optional behaviors and features that can be expected from the node receiving the message.
1 prefix_length uint8_t The number of bits from the address's prefix value that should be used. Must be in the range 0-64.
64 public signing key uchar[] The ECC public key used for signing (uncompressed format; normally prepended with \x04 )
64 public encryption key uchar[] The ECC public key used for encryption (uncompressed format; normally prepended with \x04 )
1+ nonce_trials_per_byte var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is the average number of nonce trials a node will have to perform to meet the Proof of Work requirement. 1000 is the network minimum so any lower values will be automatically raised to 1000.
1+ extra_bytes var_int Used to calculate the difficulty target of messages accepted by this node. The higher this value, the more difficult the Proof of Work must be before this individual will accept the message. This number is added to the data length to make sending small messages more difficult. 1000 is the network minimum so any lower values will be automatically raised to 1000.
1+ sig_length var_int Length of the signature
sig_length signature uchar[] The ECDSA signature which covers everything from the object header starting with the time, then appended with the decrypted data down to the extra_bytes. This was changed in protocol v3.



XXX

Proposed Stream Structure

Prefix Filter Streams Hierarchy.png

XXX

Node Connections

Node Connections Diagram.png


How do objects travel through the network / How do nodes connect to each other?

Node addresses are shared very widely (e.g. up to 100 node addresses per segment). It should be possible to detect and prevent the spread of fake node addresses through relatively simple testing methods.

If a node needs to send data to another segment, it needs to connect to a few nodes in that segment or an encompassing segment

Nodes should maintain many constant connections to nodes in their own segment and nearby segments (both higher and lower). Nodes of higher capacity should generally maintain more constant connections.


Examples

Creating a Bitmessage address and pubkey

  • Client with address creates the pubkey object
  • Client sets the first n bits of pubkey prefix nonce to match the prefix value of the address's stream
  • Client sets the remaining bits of the pubkey prefix nonce randomly
  • Client sends the pubkey to nodes in the address's stream or a lower stream of the same branch
  • The pubkey propagates through its address's stream and all lower streams

Retrieving an address's pubkey

  • Each address will have a 64 bit prefix value which is derivable from its address alone. Therefore a client with the address has all the information it needs to request the the pubkey with 100% efficiency or 100% anonymity.
  • Every pubkey is guaranteed to be covered by many nodes, even if its owner has specified the most precise possible stream value. Therefore requests can be made to those nodes of whatever precision the requesting client desires.
  • Getpubkeys should be sent to the destination address's stream
  • The retrieving node should then repeatedly query nodes in that stream for the pubkey in question

Sending a message

When a client sends a message to an address, it does the following:

  • Sets the first n bits of msg prefix nonce to match the identifying prefix bits from the destination address's pubkey
  • Sets the remaining bits of the msg prefix nonce randomly
  • Sends the msg to nodes which handle the stream matching the destination address’s identifying prefix bits OR a higher steam

Retrieving messages from the network

When a client wishes to retrieve objects from the network, it does the following:

  • XXX

Broadcasts

Broadcasts are sent to every node in their destination address's stream (like msgs) If you want to subscribe to broadcasts from an address:

  • Extract the address's prefix value from the address
  • Retrieve the address's pubkey (see above)
  • Extract the prefix length value from the address's pubkey and use it to calculate the address's current stream number
  • Periodically make a request to one or more nodes in the address's stream for any broadcast objects which are tagged as being from the address you are subscribed to

Notes

Upper Stream Desertion

If the object traffic in the Bitmessage network continues to grow, eventually a point will be reached where no single node can process all of it. At this point, any nodes that started in stream 1 will have to move to stream 2 or 3, or an even lower stream. This will mean that there will be no nodes in stream 1. This process of 'stream desertion' will continue as long as the object traffic continues to grow. However since both nodes and addresses can move to different streams, this should not present a problem for the functioning of the system.

Node prefix length examples

  • A prefix length value of 0 would mean that the node will deal with 100% of the total network object traffic.
  • A prefix length value of 5 would mean that the node will deal with 12.5% of the total network object traffic.
  • A prefix length value of 64 would mean that the node will only deal with objects with a prefix nonce that exactly matches the nodes's stream number (which will be the same as its prefix value, since all 64 bits are used).

XXX

XXX


Unresolved Questions

Rules for nodes moving between streams

As the overall size of the network changes, nodes will need to adjust the proportion of the network traffic that they handle. This will require moving between streams. How should this be done?

Rules for addresses moving between streams

As the overall size of the network changes, addresses will need to move between streams in order to preserve the balance between anonymity and efficiency that their owner has selected. How should this be done?