Link

Relaynet Core

  • Id: RS-000.
  • Status: Working draft.
  • Type: Implementation.
  • Issue tracking label: spec-core.

Abstract

This document describes the core elements of the Relaynet protocol suite, whose purpose is to make distributed systems tolerant to potentially large network latencies through the use of asynchronous messaging.

Table of contents

  1. Introduction
  2. Concepts
  3. Addressing
  4. Messaging Protocols
    1. Service Messaging Protocol
    2. Endpoint Messaging Protocol
      1. Parcel
    3. Gateway Messaging Protocol
      1. Parcel Collection Acknowledgement (PCA)
      2. Parcel Delivery Deauthorization (PDD)
      3. Cargo
      4. Cargo Collection Authorization (CCA)
  5. Message Transport Bindings
    1. Public Address Resolution
    2. Gateway Synchronization Binding
      1. Parcel collection handshake
    3. Parcel Delivery Binding
    4. Cargo Relay Binding
  6. Clock Drift Tolerance
  7. Open Questions

Introduction

Distributed systems are typically integrated using some form of Remote Procedure Call (RPC), a seemingly simple and familiar pattern that resembles local function calls in programming. Systems communicating over HTTP, such as REST or gRPC APIs, employ this pattern.

RPCs work well in a reliable network – One with a low round-trip time (RTT) and an adequate throughput. But the component making the call becomes more complicated with the need to implement retries, timeouts and circuit breakers in order to cope with an unusually high RTT or an unusually low throughput. And with an extremely high RRT and/or an extremely low throughput, RPCs do not work at all.

In contrast to RPCs, distributed systems using asynchronous messaging are implemented without any assumption that the data will reach its destination immediately or that a response will be returned. Instead, they communicate through brokers that receive, route and deliver the data.

Decoupling the two nodes in the connection makes it possible to transport the data using alternative methods when the network unavailable. For example, in places where sneakernets are used to consume foreign content, people could also use it to connect their applications to the Internet if those applications were tolerant to large RTTs.

Relaynet makes it easy to build distributed systems using asynchronous messaging in such a way that data can be securely transported on alternative media (e.g., sneakernets) when a conventional computer network is unavailable. The result is a delay-tolerant, overlay network with onion routing.

Asynchronous messaging also happens to be a better integration style, for reasons that Hohpe and Woolf eloquently summarize in Enterprise Integration Patterns (page 54):

Asynchronous messaging is fundamentally a pragmatic reaction to the problems of distributed systems. Sending a message does not require both systems to be up and ready at the same time. Furthermore, thinking about the communication in an asynchronous manner forces developers to recognize that working with a remote application is slower, which encourages design of components with high cohesion (lots of work locally) and low adhesion (selective work remotely).

Concepts

The following diagram illustrates the various components of the network and how they interact with each other:

  • A service is a collection of applications that communicate amongst themselves. If the sender or recipient is a server, the service is centralised; otherwise, the service is decentralised.
  • Applications exchange messages amongst themselves, and because they can’t communicate directly, they each use an endpoint as a broker.
  • A (service) message is serialized in the format determined by the service and does not have to be encrypted or signed.
  • An endpoint receives a message from its application and converts it into a parcel for the target application’s endpoint, and because they still can’t communicate directly, they each use a gateway as a broker. When an endpoint receives a parcel from the gateway, it has to decrypt the message and pass it to its application.
  • A parcel encapsulates exactly one service message, which is encrypted with the target endpoint’s certificate and signed with the origin endpoint’s key.
  • A gateway receives parcels from endpoints and puts them into cargo for another gateway, using a courier as a broker. When a gateway receives cargo from a courier, it decrypts the cargo and delivers the encapsulated parcels to their corresponding target endpoints.
    • A private gateway is a specific type of gateway that runs on a end-user device and serves the endpoints on that device.
    • A public gateway is a specific type of gateway that allows the endpoints behind its private gateways to reach another network (typically the Internet).
  • A cargo encapsulates one or more parcels, and it is encrypted with the target gateway’s certificate and signed with the origin gateway’s key.
  • A courier is the individual, organization or technology that transports the cargo between gateways when they can’t reach each other via the Internet. For example, it could be a sneakernet operated by volunteers or a scatternet operated by users themselves.

For example, if Twitter supported Relaynet, Twitter would be the service, the Twitter mobile apps would be applications and the Twitter API would be another application. The endpoints in the mobile apps could simply be Java (Android) or Swift (iOS) libraries, whilst the endpoint in the Twitter API could be a new API endpoint (e.g., https://api.twitter.com/relaynet).

Relaynet can also be described in terms of the OSI model as shown in the diagram below – With same-layer and adjacent-layer interactions defined by messaging protocols and message transport bindings, respectively.

Note that defining same-layer interactions at the application and relay layers is outside the scope of the protocol suite. Relaynet only prescribes the interactions with their adjacent layers. Each service has full control over its applications (see service messaging protocol), and each courier has full control over its relay layer.

Addressing

This document only defines point-to-point message delivery.

Each endpoint and gateway in Relaynet MUST have a unique, opaque address known as private address. It MAY also have a unique internet address known as public address if the node can be reached through an internet. A node is public if it has a public address, otherwise it is private.

The private address of a node MUST equal to the digest of its public key, computed as "0" || sha256(publicKey), where the 0 (zero) prefix denotes the version of the address format defined in this document, || denotes the concatenation of two strings, publicKey is the DER encoding of the SubjectPublicKeyInfo structure from RFC 5280 and sha256() outputs the SHA-256 digest in hexadecimal. For example, 0b5bb9d8014a0f9b1d61e21e796d78dccdf1352f23cd32812f4850b878ae4944c is a valid private address.

A public address MUST be a valid Uniform Resource Identifier (URI) without port, path, query or fragment components. Its host name MUST be a domain name, not an IP address. Refer to public address resolution to learn how these domain names should be used.

Messaging Protocols

These protocols establish the corresponding channels for applications, endpoints and gateways. Building on the OSI model mapping above, these protocols define the same-layer interactions.

Endpoints and gateways MUST comply with the Relaynet PKI profile, which specifies the use of certificates in these protocols. The Internet PKI profile does not apply to messaging protocols.

Endpoint and gateway channels transmit two types of messages:

  • Payload-carrying messages, which MUST be serialized with the Relaynet Abstract Message Format (RAMF). Nodes and couriers MUST enforce the post-deserialization validation described in the RAMF specification on every message they receive.
  • Control messages, which allow the two nodes to coordinate their work. Such messages are serialized with ASN.1/DER.

Each message format MUST begin with a 10-octet-long format signature that declares the type of the message:

  1. Prefix (8 octets): “Relaynet” in ASCII (hexadecimal sequence: 52 65 6c 61 79 6e 65 74).
  2. Concrete message type (1 octet).
  3. Concrete message format version (1 octet).

Service Messaging Protocol

This protocol establishes the channel between two applications in a service. The service provider has full control over this protocol, including the types of messages that its applications exchange (their contents, serialization format, etc).

Applications MAY provision Parcel Delivery Authorizations (PDAs) from their corresponding endpoints. PDAs MUST be encapsulated in service messages; for example, an application sends a message to another application in order to subscribe to updates, the authorizing application could attach the PDA to the message.

Endpoint Messaging Protocol

This protocol establishes the bidirectional channel between two endpoints. The only type of message that this specification defines at this level is the parcel.

Parcel

A parcel encapsulates a service message and is serialized with RAMF, using the octet 0x50 (“P” in ASCII) as its concrete message type.

In order to make parcels fit in cargo messages, a parcel MUST NOT span more than 8322037 octets.

The parcel payload MUST be encrypted. The corresponding plaintext MUST contain the service message and its media type, and it MUST be at least 65 KiB below the overall limit for a parcel (i.e., the payload plaintext MUST NOT be longer than 8256501 octets). The plaintext MUST be serialized with the following binary sequence (little-endian):

  1. An 8-bit unsigned integer (1 octet) representing the length of the service message type.
  2. A UTF-8 encoded string representing the type of the service message. For example, application/x-protobuf; messageType="twitter.Tweet".
  3. A 23-bit unsigned integer (3 octets) representing the length of the service message.
  4. The service message serialized in the format dictated by the service.

Gateway Messaging Protocol

This protocol establishes the channel between two gateways, and its primary purpose is to facilitate the delivery of messages from the endpoint messaging protocol over the underlying network that is available at any point in time (e.g., the Internet, a sneakernet).

The two gateways MUST maintain a single session using the Channel Session Protocol, and all keys used to encrypt payloads in this channel MUST be derived from that session.

In addition to relaying messages from the endpoint messaging protocol, this protocol supports the following messages.

Parcel Collection Acknowledgement (PCA)

A Parcel Collection Acknowledgement (PCA) is a control message used to signal to the peer gateway that the specified parcel(s) has been received and safely stored; its concrete message type is the octet 0x51. The gateway that sent the original parcels SHOULD permanently delete such parcels at that point.

The payload plaintext MUST be serialized with ASN.1/DER using the following schema:

ParcelCollectionAcknowledgement ::= SEQUENCE {
    senderEndpointPrivateAddress  VisibleString,
    recipientEndpointAddress  VisibleString,
    parcelId  VisibleString
}

Where senderEndpointPrivateAddress represents the private address of the endpoint sending the parcel, recipientEndpointAddress is the target endpoint’s public/private address and parcelId represents the RAMF message id of said parcel.

Parcel Delivery Deauthorization (PDD)

A Parcel Delivery Deauthorization (PDD) revokes one or more PDAs issued by a specific endpoint. It MAY be requested by the endpoint itself or its gateway. Regardless of which node initiated the PDD, the message sent to the peer gateway MUST have the following structure:

  • The private address for the endpoint whose PDAs should be revoked.
  • Serial numbers of the PDAs to revoke. It may be empty to revoke all the PDAs issued by the endpoint.
  • Expiry date of the deauthorization. If revoking all PDAs from the endpoint, this MUST be the expiry date of the endpoint certificate. If revoking specific PDAs, this MUST be the expiry date of the PDA with the latest expiry date.

The message MUST be serialized as the DER representation of the ParcelDeliveryDeauthorization ASN.1 type defined below:

ParcelDeliveryDeauthorization ::= SEQUENCE
{
  recipientAddress  VisibleString (SIZE(0..1024)),
  pdaSerialNumbers  SET OF INTEGER,
  expiryDateUtc     DATE-TIME
}

Gateways MUST enforce PDDs for as long as they are active. Public gateways MAY additionally cache PDDs until they expire in order to refuse future parcels whose PDA has been revoked.

Cargo

Its sole purpose is to encapsulate one or more messages from the gateway channel (e.g., parcels) when the gateways are unable to communicate securely directly and have to resort to a potentially untrusted courier.

Cargoes MUST be serialized with RAMF, using the octet 0x43 (“C” in ASCII) as its concrete message type. Couriers and gateways MUST enforce the post-deserialization validation described on the RAMF specification.

The payload ciphertext MUST be encrypted. The corresponding plaintext MUST encapsulate zero or more messages (e.g., parcels), and it MUST be serialized as the DER representation of the CargoMessageSet ASN.1 type defined below:

CargoMessageSet ::= SEQUENCE OF Message
Message ::= OCTET STRING

Where each Message is the binary serialization of each message contained in the cargo. Implementations SHOULD encapsulate messages into as few cargoes as possible.

Note that as the encrypted payload of a RAMF message, the CargoMessageSet serialization cannot be greater than 8322048 octets. Consequently, each Message MUST NOT span more than 8322037 octets long to account for the encoding of the type and length prefix in DER.

A cargo MUST NOT be encapsulated in another cargo.

Cargo Collection Authorization (CCA)

A Cargo Collection Authorization (CCA) is a RAMF-serialized message whereby Gateway A (the sender) allows a courier to collect cargo on its behalf from Gateway B (the recipient). Its concrete message type MUST be the octet 0x44. CCAs MUST NOT be encapsulated in cargo to enable the courier to process and send the CCA to target gateway.

Every CCA MUST contain an encrypted Cargo Collection Request (CCR) as its payload. The CCR MUST be serialized with ASN.1/DER using the following schema:

CargoCollectionRequest ::= SEQUENCE {
   cargoDeliveryAuthorization  OCTET STRING
}

Where cargoDeliveryAuthorization is the DER-encoded, Relaynet PKI certificate for the Cargo Delivery Authorization (CDA) issued by Gateway A to Gateway B.

Message Transport Bindings

A message transport binding, or simply binding, defines the adjacent-layer interactions in Relaynet. Its objective is to facilitate the exchange of messages in endpoint and gateway messaging protocols (e.g., parcels, cargoes, acknowledgements). This document describes the requirements applicable to all bindings, but does not define any concrete binding.

Bindings will typically leverage Layer 7 protocols, such as HTTP or purpose-built ones, but they can also use an Inter-Process Communication (IPC) mechanism provided by the host system.

Communication MUST be encrypted when the two nodes are on different computers, otherwise it is optional. Communication is deemed to happen on the same computer when either the loopback network interface (i.e., addresses in the range 127.0.0.0/8) or IPC is used. When encryption is used, it MUST be provided by Transport Layer Security (TLS) version 1.3 (RFC 5246) or newer, or an equivalent technology in non-TCP connections (e.g., DTLS). When using TLS, Server Name Identification per RFC 6066 MUST be supported by clients and it MAY be used by servers.

For performance reasons, nodes SHOULD use Unix domain sockets or any other IPC mechanism instead of the loopback network interface when they are on the same computer.

For availability and performance reasons, the node sending messages SHOULD limit the number of messages pending acknowledgements to five. Consequently, the node on the receiving end MUST hold at least five incoming messaging in its processing queue at any point in time. The receiving end MAY close the connection when this limit is exceeded.

For privacy and censorship-circumvention reasons, public addresses using DNS records SHOULD be resolved using DNS-over-HTTPS or DNS-over-TLS/DTLS, using a DNS resolver trusted by the implementer. Implementations MAY allow advanced users to set the DNS resolver. Additionally, when the connection is done over TLS 1.3 or newer, the Encrypted Server Name Identification extension SHOULD be used.

The node delivering a message MUST NOT remove it until the peer has acknowledged its receipt. The acknowledgement MUST be sent after the message is safely stored. For example, if the message is being saved to a local disk, its receipt MUST be acknowledged after calling fdatasync (on Unix-like systems) or FlushFileBuffers (on Windows).

Bindings MAY extend this specification, but they MUST NOT override it.

Public Address Resolution

The host name in a public address MUST be an SRV record, where the service name and OSI Layer 4 protocol is determined by the respective binding.

The target host in the SRV record MUST be used exclusively in the client-server connection. For example, if the public node example.com resolves to the host foo.example.com, then foo.example.com should be specified in the TLS Server Name Identification (SNI) value and the HTTP Host request header (assuming that the binding uses the TLS and HTTP protocols).

This specification forbids the use the original domain name in the public address in order to allow the same public gateway instance to share a single public IP address across its binding implementations. However, to prevent hijacking through DNS spoofing, clients MUST use DNSSEC and refuse results whose DNSSEC validation fails. A client MAY delegate DNSSEC validation to a DNS-over-HTTPS or DNS-over-TLS/DTLS resolver.

Gateway Synchronization Binding

This is a protocol that establishes a Gateway Synchronization Connection (GSC) between a private node (a private endpoint or a private gateway) and its gateway, with the primary purpose of exchanging parcels bidirectionally. The private node and its gateway MUST act as the client and server, respectively.

Before the nodes can exchange parcels, the private node MUST register with its gateway. If the gateway behind the server accepts the registration, it MUST issue a Relaynet PKI certificate to the private node. The private node MUST use its certificate to sign requests to exchange parcels with its gateway.

If the server corresponds to a private gateway, it SHOULD listen on port 276 if it has the appropriate permission to do so or port 13276 if it does not. Alternatively, if using Unix domain sockets, the client SHOULD NOT initiate a connection if the socket is owned by a user other than the administrator (root in Unix-like systems).

If the server corresponds to a public gateway, it MUST use rgsc as the service string in the SRV record.

In addition to sending parcels to each other, the client MAY also register Parcel Delivery Deauthorizations (PDD) with the server.

Parcel collection handshake

As soon as the connection is established, a handshake MUST be performed for the gateway to authenticate the private node(s). The client will be challenged to sign a nonce with the key of each private node it claims to own, as shown in the following sequence diagram.

The connection MUST be closed if the handshake fails.

Parcel Delivery Binding

This is a protocol that establishes a Parcel Delivery Connection (PDC), whose sole purpose is to allow gateways and public endpoints to deliver parcels to public nodes. In this case, the sender and the recipient will act as client and server, respectively.

If the recipient is a public gateway, it MUST override any previously queued parcel with the same id. The sending endpoint MAY use this to replace stale messages in the same relay – For example, an application sending a message to replace the user’s email address could use this to discard any previous message to replace this value.

The server MUST NOT require client authentication, but it MAY still refuse to serve suspicious, ill-behaved or abusive clients.

If the client is a private or public gateway, it SHOULD include the Internet address of the corresponding public gateway if that gateway is able to collect parcels for the endpoint that sent the parcel.

The client MUST close the underlying connection as soon as all parcels have been delivered.

Servers MUST use rpdc as the service string for the SRV record corresponding to their PDC server(s).

PDCs MUST always use TLS (or equivalent in non-TCP connections).

Cargo Relay Binding

This is a protocol that establishes a Cargo Relay Connection (CRC) between a gateway and a courier with the primary purpose of exchanging cargo bidirectionally. Consequently, this is only applicable when the underlying network is a sneakernet.

The action of transmitting a cargo over a CRC is called hop, and the action of transmitting a cargo from its origin gateway to its target gateway is relay. There are two hops in the relay of a cargo: One from its origin gateway to a courier and another from the courier to its target gateway. A courier MAY also act as a gateway to exchange cargo with another courier via a CRC, in which case the number of hops will increase accordingly.

To establish a CRC between a private gateway and a courier, the operator of the courier MUST first make the courier available for incoming connections from such gateways. Then the private gateway MUST initiate the connection subject to the approval of the operator of the device running the gateway. In this scenario, the gateway and the courier will play the roles of client and server, respectively.

On the other hand, to establish a CRC between a courier and a public gateway, the courier MUST initiate a CRC connection with each gateway for which any cargoes and/or CCAs are bound. In this scenario, the courier and the gateway will play the roles of client and server, respectively.

When a CRC is established, the following process should be done sequentially:

  1. Any cargo in the courier bound for the current gateway MUST be delivered, and the gateway MUST acknowledge the receipt of each cargo as soon as it is safely stored.

    How this step is initiated will depend on the type of node acting as the client:

    • When the client is a private gateway, it MUST initiate this step by sending one or more CCAs to the courier and the courier MUST then return all the cargo it holds for each gateway, if any. The courier MUST persist those CCAs so it can send them to their respective target gateways when collecting cargo from them in the future.
    • When the client is a courier, it MUST simply deliver the cargo bound for the current gateway, if any.

    The current gateway MUST ignore cargo whose sender’s certificate was not issued by its own self-issued certificate.

  2. No further cargoes SHOULD be exchanged for at least 5 seconds to allow sufficient time for the gateway to (a) deliver the parcels contained in the cargo to their corresponding endpoints and (b) collect any new parcels that those endpoints might have automatically produced in response to the parcels they received.

    The underlying connection (e.g., a TCP connection) MAY be closed during this time, in which case a new connection will have to be created when resuming this process.

    This step SHOULD be skipped when the courier did not deliver any cargo to the gateway in the previous step.

  3. The gateway MUST send to the courier any cargo it wants the courier to relay, and the courier MUST acknowledge the receipt of each cargo as soon as it is safely stored.

    How this step is initiated will depend on the type of node acting as the client:

    • When the client is a courier, it MUST initiate this step by sending one or more CCAs to the public gateway and the public gateway MUST then return all the cargo it holds for the gateway of each CCA, if any. Additionally, the public gateway MUST sign the cargo with the Cargo Delivery Authorization contained in the respective CCA.
    • When the client is a private gateway, it MUST simply deliver the cargo bound for its peer gateway, if any.

    The client MUST close the underlying connection at the end of this step.

When a client sends a CCA, the server MUST notify the client when it is done sending cargo for that CCA, even if no cargo was sent. The client SHOULD resend a CCA one last time when the server does not finish processing it within 5 seconds since the CCA was sent or the last cargo was received, whichever happened last. Couriers MUST NOT reuse CCAs when collecting cargo from a public gateway, so each CCA SHOULD be discarded as soon as the gateway confirms it completed processing it.

Cargoes SHOULD be redelivered one last time when they are not acknowledged within 5 seconds since their delivery.

The following diagram illustrates the binding between a private gateway and a courier:

And the following diagram illustrates the binding between a courier and a public gateway:

Gateways SHOULD defer the encapsulation of parcels and other messages into cargo until they are about to send it to the courier as that would allow them to exclude expired messages and send as few cargoes as possible. They MAY, however, set a creation date in the past to prevent an eavesdropper from tracking the time when the CRC took place.

Gateways MUST NOT delete parcels as a consequence of encapsulating them in cargo sent to couriers as that would be incompatible with the requirements of the Parcel Delivery Binding (there is no guarantee that the courier will be able to deliver those parcels before they expire).

Note that couriers are not assigned Relaynet PKI certificates, but per the requirements for bindings in general, TLS certificates or equivalent must be used when the connection spans different computers. Consequently, the node acting as server MUST provide a valid certificate which MAY NOT be issued by a Certificate Authority trusted by the client when the server is a courier: Couriers are unlikely to get certificates issued by widely trusted authorities because they are not Internet hosts, but this is deemed to be acceptable from a security standpoint because the purpose of TLS (or equivalent) in this case is to provide confidentiality from eavesdroppers, not to authenticate the server.

Public gateways MUST use rcrc as the service string for the SRV record corresponding to their CRC server(s).

CRC servers in couriers SHOULD listen on port 21473, which stands for “too late”.

Clock Drift Tolerance

Devices disconnected from the Internet will not typically have the ability to keep their clocks in synchronization with the current time. This situation, commonly known as clock drift, will be particularly bad in old devices that have never been connected to the Internet, and it could be exacerbated with a recent daylight saving time switch.

For this reason, Relaynet implementations SHOULD be tolerant to clock drifts of up two hours in all channel- or binding-level communications where at least one of the peers is a private node. For instance, this tolerance applies to the validity period of RAMF messages and Relaynet PKI certificates, as well as the validity of TLS certificates of couriers in a CRC.

For the avoidance of doubt, this recommendation does not apply to the CRC between a courier and a public gateway because neither peer is a private node. Consequently, in this scenario, couriers are still required to refuse any TLS certificate that is not valid at the time the connection takes places, but it is still recommended that the public gateway be tolerant to a potential clock drift in the channels with private gateways.

Open Questions

  • This specification only defines how to make Relaynet work on sneakernets. Maybe all the definitions specific to sneakernets should be moved to a separate spec so the core spec is agnostic of the relay layer? Using the Internet as the relay layer is already in a separate spec (RS-017).