The Dither Project

What is it?

Dither is a project with the ultimate goal of creating a decentralized and privacy-respecting Internet. It is a repository of libraries, tools, ideas, and applications that allow people to communicate privately, distribute data, manage accounts, much more. It is currently being developed by @Zyansheep and (for the time being) hosted via GitHub under the libdither organization.

The aim for Dither is to replace existing centralized applications with decentralized alternatives that are unified through their use of a singular, modular protocol.

See the application document for outlines of various applications that could be built using Dither.

Core Design Tenets

It seems helpful for projects to have guidelines to help aid design and collaboration. These are the ones I've chosen for now, as with everything, they are subject to change.

Dither should be useful

  • Dither is a project. Projects should be useful.
  • This one probably doesn't even need to be said, but it is always important while thinking about design, to keep in mind that you eventually want to create something useful.

Dither should be modular and modelable

  • Once you build something, it often gets harder and harder to add features to it unless it is clear how the parts of the system connect and you can separate your concerns. To ensure extensibility and comprehensibility for future Dither developers, modularity and modelability are key.

Dither should be interoperable

  • The goal of Dither is to replace existing services and standards. This is very hard to do (see: xkcd #927). To try to avoid this fate and make transition as easy as possible, Dither should do anything necessary to make the experience the same, or better, than existing platforms and services.
    • A real-world example of this working is PipeWire acting as a single program unifying nearly all existing audio APIs on linux. (and being backwards compatible with programs that use those other audio APIs).
  • See Dither Interoperability for more details.

Dither should rely on itself

  • This tenet is simply a reminder of the end-goal of Dither: to replace the centralized internet. So long as the other tenets are satisfied, this is the ultimate goal. (Because who doesn't want to reinvent the wheel!)


Following the first tenet of Dither, in the future the lines between these layers will blur and everything will be a module. However, the design of current operating systems don't easily allow for shared code and data, requiring a more formal structure. In the future this layered structure will be replaced with a more flexible system.

Core Process

The Core Dither Process is the part that deals with all operating system-facing operations such as data storage, establishing peer-to-peer connections with other computers, in addition to managing all Dither services and connections between them.

This "Core Process" provides a few core APIs that only certain services running in the "Service Swarm" are allowed to use for security purposes. The idea behind a "Core Process" is to create a sandboxed environment for services to run in a safe manner.

peer-to-peer connections with other computers running Dither. Currenly in the dither-sim program, this is implemented via a simple TCP stream. In the future this layer will be implemented using existing libraries such as libp2p transports, Pluggable Transports, something else, or some amalgamation of all three. The idea behind this layer is to provide as many methods of communication as feasibly possible.

Service Swarm

The service layer provides all functionality related to routing, encryption, data storage, user management and everything else. Each of these services are split up into separate modules each of which runs its own processes and communicates with other services through inter-process communication.

All these processes are managed as child processes under one "main process". The main process contains the Transport Layer implementation, the routing protocol API and APIs for managing the child services as well as managing inter-process communication between child processes.

User Interface

The application layer contains services just like the service layer that are registered under the main process. These registered service's APIs can be used by other applications.

The application layer also refers to the application's core API, which is used by the interface layer. This "core application API" can be built into the applicaiton's executable, or it can run as a service under the main process and used by multiple interfaces. This is left up to the application developers. Applications with multiple interfaces should prefer to register application APIs under the main process.

Existing planned applications may be found here.

Interface Layer

The final layer is the interface layer. This just refers to standalone applications that provide some kind of interface to the user using the services running under Dither.

These interfaces can be implemented however, but it is recommended for them to follow Dither's application design philosophy for some level of standardization.

Other Links

Inspirations for Dither

As with any creative endeavor, Dither takes inspiration from many other projects. This is a list of what parts of Dither have been inspired from other projects.


This document outlines the major structure of Dither.

System Manager

At the core of Dither is the system manager. This will be written in Rust and should only be run once on any given user account. The system manager provides a sandbox for Dither protocols and is built in a modular fashion to support any kind of setup or platform Dither might run on.

Core Services

In order to sandbox Dither services, the system manager provides certain core services such as access to storage, network, or other Dither services.

  • Service Manager
    • Provides any service with access the ability to organize other service's permissions, manage storage, as well as stop and start services at will.
  • Network Service
    • Provides any service with access the ability to establish arbitrary TCP or UDP connections.
  • Storage Service
    • Provides any service with access the ability to fetch and store data unique to that service.

Other services may be added as needed.

Service Swarm

The system manager acts as a kind of sandbox for all services running on Dither and facilitates communication between different services.


This is a list of planned Dither services and their dependencies, note these may be split up into smaller sub-services.

  • Distance-Based Routing, DBR (Network, Storage)
  • Directional-Trail Search, DTS (Distance-Based Routing, Storage)
  • Reverse-hash-lookup, RHL (Directional-Trail Search)
  • User Manager (DBR, DTS, RHL, Storage)
  • Dither Chat (DBR, DTS, RHL, User Manager, Storage)
  • Dithca (DTS, RHL, User Manager, Storage)
  • Protocol of Truth (DTS, RHL)


Applications in Dither will be external programs that communicate with specific services in the Dither System Manager. i.e. Dither Chat will use the Dither Chat Service Dithca will use the Dithca service.

Or applications can use a multitude of different services as needed.

Dither Anonymous Routing

Dither Anonymous Routing (DAR) is a peer-to-peer protocol for efficiently and flexibly obfuscating connections between computers.

It improves on speed and versatility over existing solutions (i.e. I2P and TOR) by incorporating latency and bandwidth estimation techniques so that intermediate relays may be (optionally) optimally chosen. It exposes this option to applications built on DAR, allowing the developers and users to explicitly dictate the trade-off they prefer between speed and anonymity for each networked application they use.

In addition, to align itself with the philosophy of Dither, DAR aims to be as generic as possible, so that different encryption schemes and transport types may be used for different connections to hedge against the risk of one of them breaking, as well as easily allowing for novel schemes that may not be rigorously tested, but have desirable properties such as the stateless encryption protocol described in HORNET.

Route Properties Prediction (RPP)

In order to optimally select relays to establish onion routes through, it would be desirable to select nodes that have high-bandwidth connections and are located “en-route” between the sender and receiver. In small networks, this could be accomplished by every node measuring latency and bandwidth to every other node periodically, and then compiling a two separate shared N*N matrices representing the pairwise measurements of latency and bandwidth respectively. Then, when a path of relays needs to be selected, the requesting node would test all possible paths through the network to find the shortest ones and then send requests to nodes on those paths to be relays. This would (probably) work but exposes lots of information about the network publically and even in relatively small networks of even 20+ nodes, computational, communication, and storage costs make this solution completely infeasible.

There are still some methods to solve this problem, albeit less accurately. An individual without much experience in linear algebra might be able to think up a solution to solve the latency part of the problem, simply by observing that latency between nodes seems to roughly correlate with physical distance between nodes. To see if this correlation is modle-able in practice, one could assign 3-dimensional coordinates (because we live in a 3D world) to each node and have each node progressively update that coordinate such as to minimize the latency prediction error with its peers, functionally “embedding” the latency metric into a 3D space. In this scenario, real-world latency is predicted via the “Euclidean Distance” function and this is roughly what the Vivaldi paper does, and it does predict latency. This method is considerably inaccurate because while Latency does correlate with physical distance, is not a direct causal relationship. Anything from internal processing latency, dynamic packet switching, or just a cable that doesn’t go directly from the source to destination will lengthen distances, which will then distort the space that we are trying to use to predict latencies.

For the math people: There are two properties that Euclidean physical space must have, but latency does not: Symmetry, and The Triangle Inequality. In physical space, the distance from point A to point B must be equivalent no matter from which you are measuring. This is not the case for computers on the internet as ISPs may delay connections initiated by certain nodes but not others. Symmetry violations are rare and typically non-consequential, but Triangle Inequality violations are everywhere when measuring latency. The Triangle Inequality says that given a measurement from point A to point B, a measurement from point A to B to C must always be greater than from A to B. (assuming C is not A or B). This inequality is frequently violated on the internet due to the tree-like way it is organized and the fact that routing tables may not always route packets along the most optimal paths (due to misconfiguration, or congestion management).

For those who know linear algebra, the better way to solve this problem is to imagine a large, partially filled matrix, where the ith row and jth column represent a latency or bandwidth measurement from node i to node j. When constructed on real networks, various papers have discovered that this matrix has a low rank. Rank is a metric measuring how linearly independent rows and columns of a matrix is, and matrices that are low-rank can be approximated by two matrices and where is the approximate rank of a large N*N matrix. Given a vector from column i from the matrix and vector from row j from the matrix factor, the dot product between the two vectors is the predicted latency from node i to node j. Note that unlike Vivaldi, this measurement is directional and you could swap rows/column indices where the vectors are chosen from to get a measurement from node j to node i instead. The “factorization” algorithm can be done on a single computer, but it can also be modified to work across a distributed peer-to-peer network similar to Vivaldi by simply doing running an optimization algorithm on each node to find a pair of -size vectors that minimizes the difference between measured latency/bandwidth to directly connected nodes and predicted latency/bandwidth. I call this set of vectors “Routing Coordinates”.

Modeling Fluctuating Connections

While the goal of distance-based routing could be achieved by calculating a routing coordinate to each node in the network, This has two major problems:

  1. Individual nodes or even whole parts of the network may be moving in relation to other parts of the network, and thus fluctuating in terms of (i.e. an airplane, or interplanetary communication). Thus requiring continuous recalculation of routing coordinates for all moving parties involved (which may include an entire planet, if used for inter-planetary routing).

  2. Multiple networks may form independently and thus may form incompatible coordinate systems when communicating, requiring resynchronization or some kind of coordinate translation.

The solution to this issue that would work with a Vivaldi-type prediction scheme reflects something that humans do: allow for different frames of reference. Instead of each node calculating its own coordinates based on the global coordinates of other nodes, each node is the center of its own universe and it simply represents other nodes are simply offset from itself. I will call these offsets “relative routing coordinates” (RRCs). When two nodes want to communicate with each other, they must agree on some shared known point of reference, and then exchange their relative routing coordinates to that reference to be able to route through the network. This would formalize the idea that groups of computers may be changing relative to other groups of computers and would allow for more granular changes of relative coordinates rather than constant changes of global coordinates.

Unfortunately, this strategy doesn’t work for matrix-factorization-based latency prediction, because translation is not a transformation that preserves the dot product between two vectors. Only rotations and reflections do. The problem here is still open, but I suspect the solution will entail some form of tensor factorization or perhaps an even more advanced machine learning model. A somewhat general form of the problem is formalized as follows:

The metric is predicted at some time via some black box function using the local coordinates and remote coordinates:

Metrics are occasionally measured at some time t:

We must minimize the difference between the predicted and measured metric at any given time and update our current coordinates.

Other Concerns and Assumptions

An assumption of this scheme is that intermediate routes will be directly connected to each other, but this will often not be the case, especially in the case where intermediate nodes are far away from each other. In this case, the current vision for the protocol states that packets will traverse through nodes in-between relays on the network. This will probably cause unnecessary congestion, but hopefully the discovery and peer-to-peer karma system will route the majority of traffic along high-connected nodes.

Peer Discovery

To form a network, there needs to be a process for new nodes to connect to existing nodes. Dither Anonymous Routing aims to allow peers to be as anonymous as possible, and thus the peer discovery method aims to expose as little as possible about nodes on the network. Specifically, as little information as possible about nodes that don’t want to be discovered, and nodes far away (latency-wise) from the new node.

TLDR: Peer Discovery must expose information about some nodes, but should only expose information about nodes that are nearby and want to be known.

For more information, check out the Discovery section.

To assign routing coordinates to nodes, there is a process of peer-discovery that functions as follows. This process happens whenever a new node joins the network.

  1. New node bootstraps onto the network by initiating connection to one or more existing nodes.
  2. New node tests response times (latency) to connected nodes (peers).
  3. New node requests from some subset of lowest-latency (closest) peers that it would like more peers.
  4. New node’s peers notify some slice of their peers that a new node would like more connections.
  5. Notified nodes initiate connection if they are configured to do so and measure latency to new peers.
  6. Notified nodes initiate connection with new nodes and new nodes measure latency to new peers.
  7. New node takes note of the smallest latencies of its peers and goes back to step 3 until there are no closer nodes who want to peer.
  8. After a certain number of closest nodes are found whose latency measurements are stable, the new node then calculates routing coordinates and is records its currently connected nodes so that it may reconnect when if went offline.

Through this process, a distributed network is formed that reflects the physical topology of the relative orientations of the nodes.


A packet with an RRC can be routed to its destination via the following process:

  1. Node chooses the peer that will receive the packet next by comparing RRC directions
  2. Node subtracts next peer’s RRC from packet’s RRC
  3. Node forwards modified packet to next peer

The process continues until the packet’s RRC is all zeroes and the last node it reaches either is the destination node, knows the destination node, or is the wrong node in which case the packet is dropped or sent back depending on the packet type.


Compared to the global routing tables and complicated peering and address space allocation protocols that the existing internet uses. Routing Coordinates are much better for peer-to-peer applications because they are pretty much infinitely scalable.

That said, RC in some ways give away more information than traditional IP addresses do. Since the self-organized networks that use RCs reflect real-world network topologies, just knowing someone’s routing coordinate relative to you could be akin to knowing roughly where they live. This is an acceptable risk because it is much easier to do efficient onion routing on networks with RCs than those without meaning that there is no reason not to have all connections onion-routed to some degree, providing better privacy overall.

Other benefits of routing coordinates are that they have the potential to almost completely prevent denial-of-service attacks. To even attempt such an attack, the attacker must find the routing coordinate of their target. Disregarding user error, this kind of attack is essentially impossible since everything is onion-routed by default. Even if the attacker does have the target’s routing coordinate, trying to DOS a routing coordinate is like trying to DOS the entire expanse of network between the attacker and the target, the attacker(s) will be ineffective or blocked by other nodes automatically for overuse of the network. Even distributed denial of service attacks can be mitigated with additions to the protocol allowing the victim to notify the network that they are being attacked and to rate limit the attackers.

Anonymous Routing (Onions, Garlic, and all the others…)

Conventionally, anonymous routing is an incredibly slow ordeal because of how intermediate peers are selected from the network. Due to this inefficiency, onion routing protocols have been somewhat limited in what kind of privacy they can provide because low data rates and high latency was a concern. This is no longer the case with DBR, which may support all kinds of anonymous routing schemes:

  • Onion Routing
    • The simplest routing of them all. Simply select a list of peers and establish a route from beginning to end.
  • Garlic Routing
    • Similar to onion routing, but when sending packets to multiple peers at once, send them together for them to be split apart at some mid-point in the path.
  • Multi-path routing
    • Maintain multiple Onion routes throughout the network and randomly send packets along all or some subset of them.
  • Pool Routing
    • Create a group that nodes can join. All nodes randomly send randomly-sized data packets to all other nodes at random intervals, sending real data (padded) if there is data to send and sending random bytes if not.

DBR plans to use a modification of the HORNET protocol for setting up fast onion-routed links.

Preventing Network Abuse

Routing protocols that rely solely on people voluntarily hosting nodes typically only have a relatively small number of peers willing to route packets through themselves (i.e. TOR). This is why protocols like BitTorrent, I2P and IPFS have systems in place that incentivize peers who use the network to contribute back for the benefit of all.

To accomplish this behavior for DBR there must be some way to limit packets going through nodes that either don’t use the network that much or don’t have a lot of bandwidth capacity and speed up packets through nodes that contribute greatly to the network. Also, to take into consideration are the management of nodes that have inconsistent uptime or inconsistent routing.

When talking about incentives, we are talking about game theory. So lets analyze the game theoretical situation at the level of an individual node.


  • Each node is directly connected to a fixed number of other nodes at a varying latencies and bandwidth.
  • Each node wants to send traffic through other nodes to use the network.
  • Each node wants to establish onion proxies with other nodes for privacy.
  • Each node has set of parameters that may change over time:
    • Percentage of the time it will immediately respond and route a packet.
    • Amount of traffic per unit time it is willing to route on average.
    • Max amount of traffic per unit time it can route.

The goal is to allow for unrelated nodes to route and establish proxies through each other in proportion to how much each node contributes in some way to the network.


  • Each node keeps track of the amount of traffic (bytes) flowing through its itself from directly connected nodes.
  • Each node only sends traffic through direct nodes it knows it has received traffic from.
  • There is more to theorize about here for future research :)

Conflicts with ISP Load Balancing

(This section is WIP)

Implementing an alternative routing protocol on top of regular IP routing may pose issues for ISP routing (i.e. forcing utilising of certain links too much, causing major slowdowns). ISPs don't optimize for latency or bandwidth, they optimize for load balancing to prevent too much link utilisation. TODO: Dither should take this into account by implementing its own second-layer load balancing system that makes sure ISP links aren't overloaded.

Questions for this problem:

  • How might peer-to-peer overlay networks (of various kinds) effect an ISP’s ability to do load balancing well?
    • ISP simply have to do more buffering, slowing queue times for specific links, making those links become unattractive, shifting the route prioritization to ones that are less desirable -> problem is this makes those routes unusable for regular (nearby) people and are instead hijacked by through-traffic.


  • Dawn - Selfish overlay compensate for careless underlay

  • If overlay networks can or do conflict with ISP load balancing effort, how can that conflict be reduced via design of the protocol?

    • Maybe reimplement load balancing into the network protocol?
      • What might this look like in a peer-to-peer setting?

Some set of nodes in the network:

  • Sending along certain local paths (need to be careful to not overwhelm bandwidth along local link: not likely)
  • Sending along mid-range path, some room for path diversity, but if everyone is doing it, it may create hotspot: need

Main issue: A is sending large file to B, anonymously. If they route through one path, this will be a problem, if they route along multiple paths, ISPs have more ability to distribute the load.

  • More paths = more chance for paths to be surveiled: is it obvious who is chatting with who? -> More obvious than just one path
    • Mixnets ? -> How fast would these be for large data transfers?

Research, Inspirations & Similar Work

(List in rough order of reading of notable articles I am using to implement Dither Routing. If anyone knows of any similar work not listed here, please let me know on Matrix!)

[1] HORNET: High-speed Onion Routing at the Network Layer

  • Stateless Onion Routing, improves establishment of onion routes as well as speed of forwarding.

[2] Vivaldi - pdf, video

  • The paper that pretty much started the distributed Network Coodinate System field of research.

[3] Coordinate-Based Routing for High Performance Anonymity

  • Applying Vivaldi to Anonymous Routing

[4] Phoenix: A Weight-Based Network Coordinate System Using Matrix Factorization

  • Improvements on Vivaldi, uses Matrix Factorization instead of Euclidian Embedding

[5] NCShield: Protecting Decentralized, Matrix Factorization-Based Network Coordinate Systems

  • Threat modeling on Network Coordinate Systems. Prevents Frog-Boiling attacks.

[6] DMFSGD: A Decentralized Matrix Factorization Algorithm for Network Distance Prediction

  • Improvements on Phoenix paper’s algorithm.

[7] Application-Aware Anonymity, Sherr et al.

Directional Trail Search

Directional Trail Search (DTS) is a protocol for efficiently fetching a piece of stored data from a network given its hash.

It is intended to be a vast improvement over existing protocols like IPFS by removing the use of spatially-inefficient Distributed Hash Tables (DHT), allowing for anonymous retrival and optionally anonymous hosting, and fixing the issue of scalability to create a true "inter-planetary" file system.

The Main Idea

DTS works via two forces:

  • The desire of people (nodes) who want to host information and want their information to be quickly accessible.
  • The desire of people (nodes) who want to access information and want to access it as fast as possible.

These two forces are corralled by DTS to create an efficient system to host and find data. First let us talk about the desire of a node who wants to host some data. This desire manifests in the form of a "data trail".

A data trail in DTS is a trail left in the network with the sole purpose of leading to a specific piece of data. Specifically, a trail is a chain of peered nodes that store a mapping between the hash of a specific piece of data, and the id of the next peer in the chain.

A data trail is formed with the following process:

  • A node that wants to host information broadcasts a "trail-laying" packet that travels to a specific relative coordinate on the network.
  • All the nodes that this "trail-laying" packet encounters on the way to its destination will do one of the following:
    • Reject to be a part of the trail, sending the packet to the previous node in the chain. This will reflect poorly on the node if it wants to host data of its own.
    • Register a connection between the hash contained within the packet and the id of the node the packet came from and forward the packet on to another node of a consistent distance away that it thinks will agree to be apart of the chain.
  • Allowing nodes to either host data & be apart of trails or not host data and not be apart of trails makes sure packets flow along computers that actually host data and are likely to be relatively stable and with resources to spare.

Once a data trail is formed, it may be encountered by a "trail-tracing" packet which, once on the trail, is routed directly to the root of the trail.

However, this trail of nodes may be very thin, and thus trail-searching packets may have a hard time finding trail because they skip over too many nodes. To fix this issue, trail-laying packets will broadcast out to all known peers within a certain range that they are a part of a trail related to a specific hash. Peers receiving this broadcast will use a counting bloom filter to make a record that they are nearby a trail for a specific hash. "trail-searching" packets that come across these "nearby-trail" nodes can ask the nodes to ask all their peers within a certain radius if the peers are apart of a real trail. This casts a larger net and makes it easier to find trails without putting undue burden on too many nodes.

WIP: Section about the requesters role in the network and the expectations of the network for requesters to contribute to hosting for some time. (like bitorrent or ipfs)

Specific Structure (WIP)

Note: This is an insanely hard problem to solve well: finding the node hosting a piece of data that matches a given hash on a network. The protocol here is a formulation of a protocol that might work to solve this problem.

Every Dither node implementing DTS contains the following state:

  • A hashmap that maps multihashes to its corresponding data stored on disk, so that the node may retrieve its own data.
    • The size of this map should be primarily up to the node owner's discretion, depending on how much data they would like to store.
    • Data may be temporarily be added to this hashmap to help with the caching of other node's data.
  • A hashmap that maps multihashes to peers.
  • A bloom filter containing a set of multihashes of which there are trails nearby.

Every node will store the following information about their peers:

  • WIP


While Directional Trail Search is in theory much faster and much more efficient than DHTs, it is likely not as good when considering rare data. With a DHT, as long as there is at least one node hosting the data, it will be found eventually. With DTS, there is no guarantee that a piece of data will be found (i.e. if the data trails are too far away from the requesting node to be encountered by a searching packet).

There are multiple potential solutions in order of feasibility:

  • Use a DHT in addition to DTS by default (this may have privacy implications)
  • Figure out how to get DTS to work better for rare files, perhaps by making it so that trails form circles around the earth and thus are nearly impossible not to encounter.
  • Store routing coordinates / routing areas on the Reverse Hash Lookup.
  • Implement a Network Coordination feature that tells all nodes in the network to notify a node when they find the requested data. (induces denial of service vector, probably a bad idea)

Reverse Hash Lookup (WIP)

Directed Acyclic Graph data structures (DAGs) may be elegant for storing and linking pieces of data, but they don't provide any kind of mutability on their own. This is the purpose of the Reverse Hash Lookup (RHL). RHL allows you to create a piece of data that links to two or more other pieces of data (via hash-linking) and then lookup the link given one of the linked pieces of data.


RHL has many different ways to solve the two problems of distributing links and finding links. (Links here being pieces of data containing the hash of some other object that is being "linked to"). Links could be shared only with friends or trusted individuals, links could be broadcast all over the network and re-stored by other nodes, links could be stored in blockchains or link maps maintained by centralized servers, etc. All these different use-cases may be useful for different applications, so RHL tries to generalize over all possible use-cases.


  • A Link defines its own methods of distribution and search. (Perhaps embedded as a hashtype).
  • A link can be broadcast to some set of trusted nodes (friend group), queries are done by asking friends if they have registered any links to an object.
  • A link can be registered in some sort of global consensus
    • Central server(s) store any links that are uploaded and respond to queries
    • Blockchain storing links (all "full" nodes store all links), queries done locally
    • Global Binary Tree mapping hashes to links (less space, more network activity when searching)
  • A link can be broadcast via a publish-subscribe system and exist ephemerally, find beliefs by constantly listening on a topic (or to the entire network).

Potential Applications

This kinda of system is useful as a basis for other systems that need to link disparate pieces of data together for the purposes of querying or consensus. Here are just a few examples of the systems that RHL could enable:

  • Comment systems. Each comment is a piece of data signed by some individual. Links can created over the comments that allow for querying some subset of all the comments, i.e. top comments or set of all comments or comments from friends. These links themselves may be immutable, but can be joined together via other links in a chain where the newest link in the chain is the most up-to-date view of the comment system.
  • Web of Beliefs. A justification / rule links two beliefs together and beliefs may be private or publicly shared and distributed. Beliefs justified by some rule sets may be more propagated than others (i.e. scientific beliefs). Or beliefs can be accepted or rejected based on arbitrary social conditions like "what proportion of my friends hold belief x?" (Such as is the case with names). Multiple conflicting beliefs can be held at once, such as if it is not clear which one should be accepted as default (or there are uses for both beliefs in different context, such as with definitions).
  • Anonymous Interactions. Links could be generated ephemerally to prove that a public key in some trusted set of public keys interacted with a piece of content in some manner, and then added to some hyperloglog or statistical counter.

Specification Ideas

Disp type Link<Type> is implemented for whatever hash type is used in the specific link needed. i.e.

For a link to be type-valid, the objects it links to must also be the correct type. Zero-knowledge proofs could be used for this application in the future to avoid unnecessary fetching.

struct Link<T: Type> {
  hash: Multihash,
  valid_link: (fetch(hash) : T)

User Magagement

Desirable Properties

  • Flexible Provenance
    • A user should be able to choose how strong the link between the data they publish and themselves is. (i.e. optional / zk proof / ring signature signing)
  • Flexible Privacy
    • A user should be able to choose to a reasonable extent who can see the data they publish. (i.e. one-to-many group-based publishing)
  • Multi-device support & Realistic recovery mechanisms from compromised devices.
    • If a user looses access to their devices or a mechanism of authentication (password) or device gets compromised, there should be realistic mechanisms to recover the damage both to the public and privately.
  • Anonymous Collective Feedback
    • For mechanisms of collective feedback (like/dislike counters, analytics), the data associated with the feedback event should not in general be attributable to a particular group, but should be verifiable that it was a unique member of the group that did the feedback.
  • Flexible Storage Permissions
    • Encrypted data associated with a user, stored across many different computers, should support permission hierarchies based on the sensitivity of the data. I.e. requiring more factors of authentication.
  • Web-of-trust Identity
    • Data associated with a user's identity should be vouchable by other users (in a web-of-trust) or by mechanistic processes. The identifying data should


  • Scuttlebutt
  • Polycentric


Identification is needed when one party must store and retrieve data associated with another party for the purpose of specific interaction.

Formally, identification is the process by which one party (identifier) under some assumptions internally associates some external stimulus with the known identity of another party, thus changing the identifying party's behavior.

For Example:

  • A service (identifying party) must identify a user (identified party) using a username and password (external stimulus) to allow the user to log in (changed behavior).
    • Main Assumption: The user's password or computer they are using has not been compromised.
  • An individual must identify a fellow human being using their senses before treating them not as a stranger.
    • Main Assumption: The fellow human is not attempting to disguise themselves as someone else.
  • When receiving an encrypted message from a friend, the receiver must identify the friend by verifying the signature of the message using the friend's known public key.
    • Main Assumption: The public key truly does belong to the friend and the friend's private key has not been compromised.


In Dither, there will be two general categories of identification.

  • Cryptographic identification
  • Characteristic-based identification.

Cryptographic identification will occur for when one needs to identify an party on other other end of a specific communication channel.

Characteristic-based identification will occur when one needs to find publically-declared associated data (i.e. public keys) of another party given only knowing certain characteristics of the other party.


What is a program? What is data? Why, they are one of the same.

Disp is a Decentralized Programming Language for the Decentralized Internet. The ultimate goal of Disp is (much like Dither) to allow for the replacement of all existing programming languages by creating a flexible, modular, and decentralized language.

What does it mean for a Programming Language to be Decentralized?

Modern programming languages are centraly-defined. Their parsers, type-checkers, interpreters, and specification are stored in a central repository, with complex systems of consensus creating a single evolving specification. Disp will attempt to do the opposite. It will provide a core set of tools for building languages that can be used however the user desires, and then use concepts from Category Theory (and semantic graph structures) to relate these "separate" languages together and allow them to translate seamlessly into each other.

What might this look like? Imagine having an untyped lambda calculus evaluator (i.e. a simple model of computation) and allowing users to define various type systems for this model, from untyped, duck-typed, structurally-typed, simply typed, Martin-Loef Typed, or even the Calculus of Constructions, Cubical Type Theory and beyond. All these various type systems semantics can be related using Category Theory so that types in a simpler type system can be translated to types in a more complex type system. Then, on top of this, the syntax used to display (whether that be textual or using a projectional/structured language editor).


Create a general-purpose programming language that has the following main properties:


  • Objects are stored in self-defining data structures, i.e. they link to data that describes the format of the object (whether that be structurally, in-memory, or symbolically).
    • An object can be defined by its computation -> Compilation can be as incremental as you want.
  • High level programs are provably compiled into arbitrary lower-level representations with minimal programmer input.
    • An inductive datatype like Nat could be provably compiled into something like BigInt and be much more efficient. In tern, as long as you either ignore the possibility or proove it never happens, BigInt could be transformed into a U64 or something similar.
  • Goal of compilation is cpu-specific object file complete with
    • data pre-loaded into memory
    • list of cpu-specific instructions.
    • hash of external virtual kernel API (i.e. syscall functions)
  • virtual kernel APIs can be constrained to give programs access to different parts of the computer, or forgone entirely for programs that only manipulate data.
  • Running object files requires a uniquely setup syscall kernel API that can deal with interrupts

General Architecture

  • There are two structures that are core to disp: Expr, and Judgement. Expr is an inductive structure that represents all the terms and types of the language. Judgement is the structure that matches Exprs together to create a typing judgement (i.e. saying some term is of some type or a : A). See [[expr]] for more details.

  • All structures of the language are defined through [[hashtypes]]

  • All language types are themselves self-defining structures, including the final bytecode file.

  • The language files are not stored as text files, they are instead stored as self-defining structures of groups of commands.


  • Lisp for treating programs-as-data.
  • Idris, Coq, Agda, Lean & Friends for dependent types.
  • Rust for its safe & fast zero-cost abstractions.
  • IPFS for deduplication of shared code
  • IPLD for self-defining structures

Universal System of Types

a type system is a logical system comprising a set of rules that assigns a property called a type to every "term"

Who choses the rules that assign types to terms? What makes one set of rules better than another? Some type systems are more general than others, some are more easily compiled into efficient bytecode, some allow for writing complex mathematical proofs. Exactly how large is this design space?

Disp captures the entire design space of possible type systems by generalizing the concept of types:

To make a long story short, a type is simply a program that returns true, false, or loops forever when given the source code of another program.

This allows the programmer to define their own type system in Disp. Or just use a pre-made one.

The Implementation

In disp, every term is stored in a context as a judgement that contains a term and a type:

fn main() {
struct Judgement {
	term: Expr,
	ty: Expr,

To verify if a judgement is valid and may be stored in a context, the compiler encodes the term using simple church-encoded data structures (like tuples and sums). The encoded form is then applied to the ty expression and beta-reduced. The judgement is considered valid if the expression reduces to Expr::Var.

Type System of Disp

While the type system of Disp can be literally any computable program, disp will have its own standard type system defined in the standard context that will allow for the creation of programs optimisable to a level at or beyond the speed of C and Rust.

The standard type system for disp should be:

  • Simple
    • Relatively few core inference rules
    • More complex types are constructed from a few simple primitive type constructors.
  • Fast
    • Types that reflect hardware primitives (such as machinecode) should be representable and the type system should be able to reason about equivalence between inefficient high-level types and efficient low-level types.
  • Consistent
    • Logic of types should be consistent
    • Type checking should never loop indefinitely
  • Representative
    • The standard type system for Disp aims to be a viable foundation for all of mathematics.

Substructural Types

For a systems language, runtime overhead must be kept to a minimum. This means that the restrictions on lifetime and movability of heap and stack allocations should be modeled and enforced at compile time, instead of relying on a runtime methods such as garbage collection or reference counting.

Disp uses a type system that divides terms and types into 3 categories:


  • This includes most types, type constructors, and terms that can fit in a single CPU register.
  • Unrestricted objects may be moved around and duplicated at will in a program and don't require a specific function defining how they are deallocated.


  • This includes terms allocated on the heap, mostly structures with variable-length allocations.
  • Linear objects must be used in each context exactly once and must define at least one destructor or else it won't typecheck.
  • Linear objects may contain unrestricted objects, just like how heap-allocated objects may contain numbers that can be stored in registers.
  • Examples: ATS, Rust kinda


  • This includes any term allocated on the stack, which is pretty much anything else.
  • Ordered objects must be used in each context exactly once and can only be used in reverse-order of allocation (like a stack).
  • Ordered objects may contain unrestricted or linear objects.
  • Examples: Webassembly, Porth

A paper outlining a potential framework for a type system unifying all 3 of these substructural modes can be found here.

Minimal Type Theory

In order for Disp to be able to double as a general-purpose theorem prover, the ideas from this paper describing a Minimal Type Theory (mTT) may be used.


Syntax Agnosticism

Syntax is one of the most visible barriers separating different languages from each other. It is easy to distinguish between Lisp and C, BASIC and APL, Haskell and Fortran. All these languages can pretty much do the same things, but their different syntax plays a big role in preventing programmers experienced in one from trying out another.

Disp aims to solve this issue by allowing the programmer to use whatever syntax style they prefer. This is possible because Disp is not stored in text files, it is instead temporarily rendered to a text file in a given syntax style and then parsed back to a structured binary format. Syntax styles should be compatible with each other and switching between them should be as simple as a click of a button.


Since Disp has Syntax Agnosticism, it does not require one set syntax. For consistency and branding purposes however, Disp will have its own succinct and readable default syntax. This

Disp's default syntax is still a work-in-progress, however I think I want something similar to lisp with various Rust conventions sprinkled in.


fn main() {
/// Define things with `let` and `:=`
let thing := 3

/// Qualify the type of your definitions with `:`
let pi := 3.141592653589798283 : Real

/// Define unordered sets
let set := {pi, thing} : {Real, Nat}
/assert_eq Set::get(set, pi) pi
/assert_typ Set::get(set, pi) Real
/// Define ordered sets
let list := [pi, thing, 2] : [Real, Nat, Nat]
/assert_eq list.0 pi

type resolve : Ident -> Type
type resolve_elem : {Set, Ident} -> Type
type resolve_list : {List, Nat} -> Type

type parse_nat : String -> Nat

type Nat : Type;
type Z : Nat;
type S : Nat -> Nat;

let NatDef := {
	type zero : Nat,
	type succ : Nat -> Nat, 
} : Type

let TypeParsing := { T : Type, Error : Type } -> {
	type parse : String -> Result<T, Error>
type Bool : Type;
type true : Bool;
type false : bool;

let _ : TypeParsing(Bool) := {
	let parse := (string) -> (
		match string {
			"true" => true,
			"false" => false,

/// The type
let TypeConstructor := Type -> Type;

type round : Real -> Nat
round(pie) := thing

type Nat : Type
type Real : Type
type Type : Type // Warning: Type is already a member of Type because it is the Type of all Types, including itself.


Note: This Syntax is just an idea and subject to lots of change

// By itself, this is a comment
// It doubles as documentation when paired with an object

// This is an object, its type in inferred.
set Unit ();

// The function type is a function that takes two types A and B and returns a dependent product type where B is *not* dependent on a term of A.
set -> λ[A B] Π[_:A] B
set /\ λ[A B] Π[C:Type] (A -> B -> C) -> C
set id_type Π[A: Type] A -> A
set id λ[t] λ[x] x : id_type

// Pair constructor
set pair_type Π[A: Type, B: Type] A -> B -> C
set pair λ[x y f] f x y

set Bool Π[A: Type] A -> A -> A {
	true: λ[x y] x
	false: λ[x y] y


// This is a comment, it also doubles as documentation

// Variables
(set ten 10:b64)

// variable names can have most symbols in them
(set ten? 10) // literal 10 resolves to 10:bsize

(use std::flow::assert)
(assert_exact ten ten?) // => true only on 64-bit systems, because bsize = b64
(assert_eq ten ten?) // => always true

// Reason about functions
(use std::code::compile)
	// Parentheses implied
	compile `(a b) `(assert_eq a b) 
	compile `(a b) `(panic_if (!= a b))
	compile `(a b) `(if (= a b) (panic))
) // => these functions are the exact same when compiled

// Strings
(set string "Hello, World!") // implies "Hello, World!":String

// String declaration follows the same declaration convention as Rust, except that Strings are stack-allocated by default.
(print string) // => prints out "Hello, World!"
(print "And the God of Programming said: {:?}" string) // => prints out formatted string with debug representation of string.

// Type definition (with type macro)
(type SomeStruct
	number b64
	_ (String Character)

// This macro resolves to something similar to using the formats of Type and TypeLocalization hashtype.
(set SomeStruct ( (hash b64) ( (hash String) (hash Character)):Type))
(set SomeStructLocalization ((hash SomeStruct) ):TypeLocalization)

// Trait definition using trait macro
// A trait defines a collection of adjacent data (either a function, or a value) that is attached to a Type
(trait SomeTrait
	fn size (&self) (usize)

(impl AddOne SomeStruct
	set add_one (fn (~self) (self) (
		(inc (loc self number) 1)

// Initiate SomeStruct with values
// symbol = thing notation is equivalent to (set symbol thing)
some_struct = (1098342 ("Hello, World!", 'u')):SomeStruct

(some_struct size)

(1098342:B64 -12304: String:"hello there" Symbol:hello)


A Natural Idea

A word in a language is a label to which we assign a definition. These definitions may be physical or abstract, but in a programming languages: a definition is a piece of code.

let id = λx . x - We are fitting the label "id" to the definition λx . x

Words in natural languages may have more than one definition. When using these words in conversation however, typically only one definition is intended by the speaker. The definitions and names we use in conversation are set by the context of the conversation to which the speaker hopes the listener has deduced correctly.

Contexts also exist in programming languages. Namespaces, modules, classes. Unlike real conversation where contexts can be deduced from various cues, computers need to know exactly what definition a label corresponds to and can't just "figure it out" (yet). A context in a programming language is meticulously organized hierarchy of modules. When using these modules, programmers have to specifically import names into the context of their code, requiring the programmer to keep in their head where in the module hierarchy all the names they need to use are.

Contrast this to how we deal with context in natural languages, where when starting a conversation, we assume don't assume what the single definition of a given word will be without first inferring a context from surrounding cues. For example if we are talking garnishing food, you might be referring to adding things whereas if we are talking about garnishing wages, we are talking about taking away. This is the way our brains are used to dealing with context, so it puts a special strain on the programmer where a complex pre-defined hierarchy must be memorized to actually program effectively.

The goal: Remove this burden on the programmer by resolving names from the context of the program.


A perfect way to model resolving from contexts is to use a knowledge graph. A name may resolve to many different expressions, but the scope of possibilities may be restricted further by the context of the types of surrounding types or specifically declared contexts.


Names and programs in Disp are separate, this is so that alpha-equivalence is preserved and so that the same program may be able to named differently by different people.

A program is represented by an Expr. A Named program is represented by a NamedExpr

fn main() {
enum Expr {
	Abs { binding: Binding, expr: Expr },
	App { func: Expr, args: Expr }
struct NamedExpr {
	name: String,
	expr: NameTree,
enum NameTree {
	Abs { bind_name: String, expr: NameTree }, // For Abs and Pi binding
	App { left: NameTree, right: NameTree } // For App

Hardware Modeling

How do we generate fast code?

  • Currently, Optimizing Compilers generate fast code through intermediate representations and various techniques to figure out what code is needed and what code is not.
  • These strategies are mostly arbitrary and not easily improvable. A programmer has to go through and figure out which optimizations are worth the compiler computation cost and which aren't. There are an immeasurable number of trade-offs and potential improvements for a given piece of hardware. This calls for a new method to quantify these trade-offs and create a better system.

How do we create fast algorithms?

  • We typically manually write algorithms we think will run fast and then compare them to other algorithms using benchmarks. However, benchmarks have so many confounding factors that it is difficult to compare two of them and make definitive conclusions.
    • Results may also depend on whether the algorithm is fit for the CPU architecture. The same algorithm may run faster or slower depending on a bazillion different factors such as layout

How To Model Performance

To create efficient code, we must have a model of how that code is run. Typically this model is held in the programmer's mind and is expressed through compiler and algorithm design. Instead of holding it in our forgetful, biased brains, it might be a good idea to have a software-defined model of our hardware so that we can prove that our optimizations and algorithms are actually faster.

Model specifications & options

  • Should take into account modern CPU features such as: pipelining, superscalar execution, out-of-order execution, and branch prediction.
  • Should be generic enough to model different CPU generations of the same architecture.
  • Should take into account other hardware qualities such as RAM latency, GPU parallelization options, and any other model-able aspect that could influence performance.
  • Simple models could be hand-crafted using published characteristics from documents like the Instruction Tables
  • More sophisticated models could be made for open source CPU architectures by using the logic gate layout of the CPU. (This may be possible for architectures like RISC-V)

Benefits to this approach include:

  • No more annoying benchmarks Faster algorithms can be provably faster for a given model.
    • Note: Benchmarks still need to be used to compare different models of a hardware system, however these benchmarks can be much more sophisticated and take into account more confounding factors.
  • Faster optimization adoption Compilers can substitute old optimizations for faster ones without delay using a public distributed database of optimization.
    • Scenario
      • Bob has a new optimization for a certain pattern of functional expressions.
      • Bob proves it is faster for a well-trusted model of x86_64 CPUs
      • Bob broadcasts optimization to everyone
      • Everyone checks the proof that it is faster
      • Everyone uses new optimization, programs are faster

Fundamental Constructs


Expressions in Disp are programs and match the untyped lambda calculus. They are represented as follows:

fn main() {
enum Expr {
	Lam { bind: Binding, expr: Expr },
	App { func: Expr, func: Expr },
enum Binding {
	Branch(Binding, Binding),

See how the Binding structure works here.


fn main() {
struct Judgement {
	term: Expr,
	ty: Expr,

Judgements are considered valid if when the structure of term is encoded into the lambda calculus, and Expr::App(ty, encoding) beta-reduces to Expr::Var.


fn main() {
struct Context {
	names: Map<String, Judgement>

Bind Trees: A deduplication friendly alternative to De Brujin Indicies.

Used in [Disp]

The current most well known way of representing the lambda calculus is De Brujin Indicies.

This is where you have 3 variants of a term:

  • Var(Number) - Number represents how many layers of abstraction (lambdas) the variable is away from its intended binding
  • App(Term, ...) - Application is a list of Terms
  • Lam(Term) - Lambda represents an abstraction point

An example of what this might look like for common equations:

  • λx . x = Lam(Var(1))
  • λx. λy. x = Lam(Lam(Var(2)))
  • λx. (x (λx . λy . x) (λx . λy . y)) = Lam(App(Var(1), Lam(Lam(Var(1))), Lam(Lam(Var(2)))))

For Disp, I have devised an alternative method that doesn't require integers (and thus is not limited by encoding constraints)

Disp has the same 3 variants of Term:

  • Var
  • App(Term, Term)
  • Lam(Binding, Term)

In addition to a construct Binding:

  • None
  • End
  • Branch(Binding, Binding)

In fact, the Binding datastructure can be repurposed from the Term datastructure. (End = Var, None = Lam(Var, Var), Branch(Binding, Binding) = App(Term, Term))

Common equation examples of this structure might look like:

  • λx . x = Lam(Var, Var)
  • λx. λy. x = Lam(Var, Lam(None, Var))
  • λx. (x (λx . λy . y) (λx . λy . x)) = Lam(Branch(Branch(End, None), None), App( App( Var, Lam(None, Lam(Var, Var)) ), Lam(Var, Lam(None, Var)) ) As opposed to De Brujin indices where the variables "point" to their corresponding lambda abstractions, with Bind Trees, the lambda terms carry an auxiliary datastructure specifying exactly which Variables should be replaced.

In addition when it comes to storage and integration with Dither, these lambda expressions are identified by hash which gives a datastructure like this:

  • Var
  • App(Hash, Hash)
  • Lam(Hash, Hash)

Disp Language Design

Disp is an ambitious programming language designed to be the ultimate language that can successfully interpret the source code of any other language in the world. The primary features of disp include user-defined syntax, abstract syntax trees (ASTs), flexibility in programming paradigms, and extensibility. This summary outlines the key aspects of the language's design, organized into relevant sections.

Abstract Syntax Trees (ASTs)

ASTs are the backbone of disp, serving as a flexible representation of various programming constructs, paradigms, and languages. ASTs are user-defined and can represent anything from lambda calculus expressions and combinator calculi to Turing machines, assembly instructions, or LLVM IR.

User-Defined Syntax

Disp allows users to create custom syntax and define their own ASTs. A default AST serves as the starting point for newcomers, and it can be extended or modified as needed. Users can create their own parsers and pretty-printers to handle new syntax, which can be used to parse different programming paradigms down to a single "core" AST.

Default AST

The default AST is designed to be versatile and configurable, allowing users to easily extend it with their own syntax and constructs. This AST should include support for common programming paradigms and constructs, such as if-else statements, loops, pattern matching, functions, and more.

Programming Paradigms

Disp is designed to support a wide range of programming paradigms, including object-oriented, functional, and procedural styles. These paradigms are handled by defining different ASTs that can represent each paradigm and by providing parsers and pretty-printers that can parse and generate code for the chosen paradigm. Disp leverages the underlying Turing completeness of these paradigms to enable their coexistence and interconversion.

Type System

Disp will feature a strong static type system with support for "steady typing," a variant of gradual typing. Steady typing allows for type annotations to be optional, with the IDE guiding users through type mismatches and helping them identify potential issues without being overwhelmed. Disp aims to have a robust type inference system to minimize the need for explicit type information.

Concurrency and Parallelism

Concurrency and parallelism in disp will be managed through AST definitions. The default AST and parser (Default Disp) will include concurrency constructs similar to Rust's approach, defining types like Future and using syntactic sugar to simplify their usage.

Error Handling and Debugging

Disp will adopt an error handling strategy inspired by Rust, using types like Result and Option along with syntactic sugar for easier usage. Additionally, Disp aims to have a strong static type system to help users identify potential issues. Future developments may include live debuggers and other advanced debugging tools.

Libraries, Modules, and Namespaces

Disp will not have a traditional module system for its default variant. Instead, all names will occupy a global namespace. AST patterns will encourage keeping separate ASTs for name definitions and structural definitions, allowing the compiler to deduce the correct name usage based on context. Code will be stored in a deduplicated, decentralized database with names accessible in a global lookup table.


Disp will support interoperability with other languages and libraries through specific AST extensions. Users can create extended ASTs with primitives for external library calls and link them using the disp command.

Performance and Optimization

Disp will optimize code through equivalence graphs and machine learning algorithms trained in a decentralized manner. By leveraging linear and ordered types, Disp will allow for advanced rewrites and optimizations that are inaccessible to other languages.

Security and Sandbox

Disp's ASTs can be compiled to various targets, including WebAssembly, enabling sandboxed execution. This flexibility allows disp to be run securely in different environments.

Tooling and Ecosystem

Tooling for disp is still in the early stages of planning. The ultimate goal is to create a structural editor with robust debugging, REPL, and syntax-modifying features. In the meantime, a single disp command will be used to add (parse) text files of code into ASTs, convert those ASTs to other ASTs (for compilation or changing code style), and provide target-specific AST interpreters for working with external libraries in other languages (like C).

Standard Library and Built-in Functions

The scope of the standard library for disp will need to be determined. It should provide common utility functions, data structures, and algorithms. The standard library can be implemented as part of the core language or as separate, user-extensible modules. Language Evolution and Versioning

Disp should have a mechanism for managing language versions and compatibility between different versions of disp code and libraries. This will ensure smooth transitions as new features are added or existing features are modified.

Documentation and Learning Resources

High-quality documentation, tutorials, and learning resources are essential for the success and adoption of disp. A plan should be put in place for the creation and maintenance of comprehensive documentation to support both new and existing users.


Disp is a highly ambitious and flexible programming language designed to interpret the source code of any other language. With user-defined syntax, versatile ASTs, support for multiple programming paradigms, a strong static type system, and a focus on optimization, disp aims to be the ultimate programming language. The language will continue to evolve as new features are added and existing features are refined, with a strong emphasis on tooling, documentation, and learning resources to support its growing user base.

fn main() {
/// Endofunctor is a function from `Type -> Type` and a mapping operation for morphisms.
set Endofunctor { F : { A : Type } -> Type } -> {
	/// map is a mapping operation for outgoing morphisms on `A : Type`.
	map : { [ B : Type, morphism : A -> B ], a : F(A) } -> F(B)
fn main() {
/// Magma is a type with a "unit" element and a binary operation
set Magma [ M : Type ] -> {
	unit: M,
	mult: [a : M, b : M] -> M

/// Associativity is a property of a type and a binary operation
set Associativity [ M : Type, bin_op: [M, M] -> M ] -> { {a, b, c} : M^3 } Equivalence(bin_op(bin_op(a, b), c), bin_op(a, bin_op(b, c)))

/// Identity is a property of a type, element `elem` of the type, and a binary operation `bin_op` over the type proving that there is some element of the type such that `bin_op(m, elem) = m` and `bin_op(elem, m) = m`
set Identity [ M : Type, { elem: M, bin_op: [M, M] -> M } ] -> { m : M } {
	Equivalence(bin_op(m, elem), m),
	Equivalence(bin_op(elem, m), m)

/// Inverse is a property of a type, element `elem` of the type, and a binary operation over the type proving that for all elements `a` of the type there is a corresponding element `inverse_a` such that `bin_op(a, inverse_a) = elem` and `bin_op(inverse_a, a) = elem`
set Inverse [ M : Type, elem: M, bin_op: [M, M] -> M ] -> { m : M } [
	inverse_a : M,
		Equivalence(bin_op(a, inverse_a), elem),
		Equivalence(bin_op(inverse_a, a), elem)

/// A Semigroup is a magma with associativity
set Semigroup [ M : Type ] -> [
	magma: Magma[M],
	Associativity[M, magma.mult]

/// A Monoid is a magma with associativity and identity
set Monoid [ M : Type ] -> [
	magma: Magma[M],
		Associativity[M, magma.mult],
		Identity[M, magma],

/// A group is a magma with associativity, identity, and inversability
set Group [ M : Type ] -> [
	magma: Magma[M],
		Associativity[M, magma.mult],
		Identity[M, magma],
		Inversability[M, magma]
fn main() {
/// Full definition of a monoid
set Monoid [ M : Type ] -> [
		unit : M,
		multiplication : [a : M, b : M] -> M
		associativity : { a : M, b : M, c : M } -> Equivalence(multiplication[multiplication[a, b], c], multiplication[a, multiplication[b, c]])
		identity : { a : M } -> Equivalence(multiplication(a, unit), multiplication(unit, a))
fn main() {
/// A Monad is just an Endofunctor and a Monoid
set Monad [ M : Type -> Type ] -> {

Dither Application Index

List of application Ideas

  • Dither Chat - Community Chat application aiming to replace Discord. Provides e2ee encrypted DMs, voice chat, servers, voting, and integration with most other chat protocols.
  • Dithca - Comprehensive & Versatile decentralized comment system where anyone can comment on any type of data structure on Dither. Can interface with most other centralized comment systems and deal with misinformation & crediting using a comprehensive community flagging system.
    • Can be used to create Reddit / Twitter Replacement. Can be integrated into other Dither applications or ported to web.
  • Dithgit - Github on Dither
  • Dithix - Dither Resource Manager: Manage and cache any kind of resource, interface between the Merkle Tree and the Filesystem.
  • Nomia on Dither - Nomia on Dither,
  • Tree of Math - Directed Acyclic Graph linking a standardized data structure for defintions and proofs together based on set theory creating a comprehensive tree of knowledge.
  • Dither Coin - Cryptocurrency that solves all the current problems and creates a complete digital replication of cash (being decentralized, anonymous, non-volatile, and difficult to trace).
  • Dither DEX - A all-faceted decentralized exchange to facilitate trade any kind of real or virtual asset. Supports meeting up in real life or exchanging other virtual assets anonymously and securely. Also supports moderation, karmic filtering
  • Protocol of Truth

Other Application Ideas:

  • Manga & Reading App
    • UI will be similar to Tachiyomi, but will also support book reading. Pulls content from various websites and stores on Dither. Written in Flutter, desktop & mobile versions. Supports comments through Dithca. Has built-in feature for paying for translation & replacing bad translations with better translations if they are made.
  • YouTube replacement
    • pulls and stores videos in a decentralized, uncensorable manner from any site that youtube-dl supports
    • Built-in chat (Using Dithca protocol)
    • Community-generated captions, sections, sponsor segments (pulled from sponsorblock) flagging etc.
    • Support for likes, view counting (and congregating), and "Hearts" (method of giving Dither Coin to creators).
    • Automatic community flagging of stolen/used content (including music, other people's videos, meme origins, pretty much anything)
    • Community misinformation flagging of content (interfaces with Dithca)
  • Peer-to-Peer Exchange
    • Allow for the exchange

Dither Chat

What is it

Dither Chat is a decentralized communication application using the Dither protocol. Servers are communally hosted with local consensus. Bots and plugins will be supported and also communally hosted. tl;dr Discord but decentralized and better.

Users & Sync

  • Each user may have multiple Peers (devices that Dither is installed on)
  • Chat Event history can be optionally synced across Peers.
  • A peer may host multiple users
  • Each UserId must have at least 1 peer that hosts it.

Chat Events

  • All Events are signed with the private key of the person who sent it (these will be verified with a config option to let through or ignore unsigned or incorrectly signed events)
  • Chat Message structure
    • Date sent, last edited, markdown data / embed json, UserID mentions, emoji reactions
  • Rich Presence (updating custom statuses and online/offline status)
    • Optional storage - can store presence update history (off by default)
    • Optional sending, extracts information about what you are currently doing and updates your friends. (on by default)
    • Customization options to only share with certain friends

Event Storage - Storage of a sequence of events in memory or storage

  • Stored as hash-linked local blocktree that messages are added to and new blocks are created when a certain amount of time elapses between the last message sent or max block size exceeded. Block size can be set to 1 to prevent messages being ordered out-of-order.
  • Indexing can be done on a by-block level (TODO: more customization options needed)
  • Block structure can allow for thread branching & thread conversation movement across users. (e.g. create a group dm on top of an ongoing conversation)

Trusted Friends Application API

  • Option to rank friends manually or by how much you chat with them
  • Can mark friends as “Trusted, Neutral or Untrusted”
  • Friend rank can be used by other applications
  • e.g. for Stellar Consensus Protocol quorum selection

Chat Interface

  • Built-in markdown formatting + advanced chat box (similar to discord)
  • Link Displaying
  • Metadata can be sent so receiver doesn’t have to send request to web pages
  • TODO: Do we need to worry about invalid metadata being sent, tricking the user? Perhaps just scanning for suspicious domains is enough.
  • Receivers can choose if they want to fetch link data, only fetch commonly used sites (e.g. youtube, twitter, soundcloud etc.) or not fetch anything at all and only display sent link metadata

Direct Messaging

  • Simply sending JSON-encoded message/other events to UserID on Dither

Group messaging

  • Messages are broadcast over gossipsub and conflicting blocks are ordered by time.


  • Servers are communially hosted by the moderator's computers. However, the owner has full control over the server and can choose who can assist
  • Red Nodes are hosting nodes, blue nodes are members. Blue node with yellow stroke is proxying it's connection to the server Network Structure for Dither Server Image of a possible dither-chat server arrangement

Roles, Tags & Colors

  • To distinguish people in a server, there are roles, name colors, and tags
  • Tags are shown next to tagowner's name as a small icon (like in Discord)
    • By default there is only 1 tag enabled for a server: the Owner tag.
    • There are other tag presets such as Donator, Moderator & Member
    • Custom tags can be created and be attached to a specific role.
    • Tags should be displayed in order of importance
  • Roles can be made for anyone and can have permissions attached to them
    • Roles and can also be organized into hierarchies where users can be ranked-up to posses a higher role (usually with more permissions)
  • Each role has a color and color priority assigned
    • This is used to determine what color a user's username should be assigned

Protocol of Truth

This is a protocol of back and forth debate with the aim to evaluate the quality and truthfulness of content posted to the internet.

If you have ever seen a photoshopped post on social media, or a video citing a badly done study you will know that it is incredibly hard to figure out how well-researched a piece of content is at first glance. This Dither protocol aims to inform through discourse the truthfulness of a given piece of media to anyone who might come across it.

The Process

The process for the Protocol of Truth is carried out by trusted individuals (chosen by a karma system). The steps are as follows:

  1. The Media is examined for a set of assertions it makes as well as deductive and inductive reasoning made using the assertions. This creates a "Assertion Graph".
  2. Once the graph is constructed and properly cited (i.e. each assertion should provide some link to the part of the media it was extracted from) assertions may be supported or criticized.
  3. Assertions in "Assertion Graphs" may be shared between different media and thus the supporting evidence presented by one media may be used in the justification of another. This supporting evidence may be found independently or through the author's cited sources.
  4. Each Assertion in the graph may be labeled through a persistent voting scheme under the following categories of deduction type, truthfulness and standard of evidence.
  • Deductive / Inductive - What kind of assertion is it.
  • Truthiness - How the trusted community rates the assertion in terms of truthfulness.
    • Inherited - The assertion is deduced from the assumptions, therefore its truthfulness is inherited from the truthfulness of the assumptions.
    • Likely True - The assertion is likely true
    • Likely False - The assertion is likely false
    • Ambiguous - The truth of the assertion cannot be satisfyingly asserted either way.
  • Standard of Evidence
    • Statistically Significant - The assertion is supported by sound scientific observation or experiment.
    • Personal Experience - The assertion is a personal observation, it is subject to individual bias.
    • Common Knowledge - The assertion is generally held to be true, but there is no specific evidence to support it. i.e. stuff like "thing X exists" or "hammers are generally used to hammer in nails". There is no hard evidence for it, but it is generally held to be true. This is the least powerful standard of evidence and should be replaced with a more powerful form if at all possible.

Application Design Philosophy

Application APIs should be future-proofed as much as possible

Application GUIs should be generally be as cross platform as possible.

There should be different modes of use for different user skill levels to try and accomidate as many people as possible.

  • These modes should enable/disable various aspects of configurability in the application
  • Simple mode should be super easy to use for anyone and come with a quick tutorial
    • Configuration options should be assumed as much as possible,
  • Default mode should come with a tutorial and explain how modes work and how to change them
  • Advanced mode assumes the user already knows how to use the app or can figure it out on their own. All configurability functionality should be enabled.
    • Application developers should strive to implement as much configurability as possible into every aspect of their application for complex users.
    • For desktop applications, configuration on a small level (for specific contextual buttons or features) could be shown through right-click menus.
    • For mobile applications these configuration items can be shown through a long-press.

Why Copyright is Cancer

Copyright is ineffective in the internet age. Any piece of content can be pirated on a massive scale without the original creator even knowing. The only thing copyright does is suppress derivative creation on forums with strict copyright adherence. See these videos on why the current copyright system is broken and why it makes no sense. This document outlines an alternate system of funding creation that Dither aims to create.

Have a decentralized & democratized forum of communication and publication that no institution or individual can meaningfully obstruct. Dither is a protocol for creating decentralized applications.

Provide a direct connection of support between the consumers and the creators allowing people to directly support creators for the content they produce (Paying for Production, not Distribution). Even huge projects like Marvel movies or video games can be supported through community fundraising. Each production raises the reputation of the artist(s), allowing them to raise more money from their fans for the next project.

Preventing artists from having control over their art allows all artists to use and adapt and recreate the work of other artists. Imagine all the games and movies that were super hyped up but totally flopped. If anyone was allowed to make anyone else's work but better, artists have an incentive to make good stories so that they can raise money for their next project, instead of making something crappy and making all their money off of preorders. This would also prevent the fragmentation of distribution services as no one entity can have exclusive control over a piece of content.

But What IS a Monad?

This is an article that defines what a Monad while trying to stay as close as possible to its roots in Category Theory.

If you've ever heard of monads in the context of functional programming, you've likely also heard that they are just "Monoids in the Category of Endofunctors". But what does that even mean?

Well for one, this quote could use some more detail. The more descriptive quote is: "a monad in is just a monoid in the category of endofunctors of , with product replaced by composition of endofunctors and unit set by the identity endofunctor."

Throughout this article, we will define "Category", "Endofunctor" and "Monoid" in a fictional programming language, and put these definitions together to define a Monad.

A Category (designated using curly letters, i.e. ), for the purposes of this article, is simply some defined set of mathematical objects (numbers, values, types, terms, or even categories themselves) as well as some set of "morphisms" defined between the objects. A category also has some conventions such as:

  • Every object has a morphism to itself (called the identity morphism)
  • Morphisms can be combined together into other morphisms much like function composition.
  • Morphism composition is associative.

Functors in category theory are simply morphisms in a "category of categories". Where categories are objects and functors are just morphisms that map one category to another. This can be visualized at either a high level of abstraction where the functor is just one arrow between two categories (). Or at a lower level where a functor could be represented as many parellel arrows uniquely mapping every object and morphism in category to an object or morphism in category .

Endofunctors are just the special case where the two categories and are the same category. Endofunctors are essentially just a mapping between each object and morphism in a category to another object/morphism in the same category.

Endofunctors come up often when working in a specific Categories. For example all the types in a programming language, form a category, where the morphisms are functions defined between them. This category is called Type, and can be thought of as the "type of types".

There are many programming languages that have endofunctors. Option in Rust, Maybe in Haskell, or really any kind of wrapper type in any language is technically an endofunctor. While most languages have endofunctors, most don't have a powerful enough type system to formalize the definition of endofunctors itself. (For rustaceans, think of it like a trait that you can implement for type constructors like Option or Vec, not Option<T>, just Option by itself. this is not yet possible in rust at the time this article was written).

When defining what it means to be an Endofunctor formally within a given Category of objects (i.e. a Type System), one needs a powerful dependently-typed lagnuage. For the purposes of this article, we will be using a fictitious dependently-typed language. The following is an outline of some of the general features of the language.

Pseudocode Definitions (skip this if you think you can grok the psuedocode first try):

fn main() {
//This binds some name to a value and an optional type in some context. The value may be a type (like Vec(A)) or a value (like `3`)
let <name> [: <type>] := <value>
fn main() {
// A name bound to a value may be given a valid type, or one will be inferred.
// This will have an inferred type of `Nat` because 3 is most likely to be a natural number
let three := 3
// The type of `four` is specified to be `Int` (i.e. an integer). This is valid because while 4 is inferred to be Nat, Nat is a subset of Int (integers).
let four : Int := 4
// This is a type declaration. `Likelyhood` is defined to be a set of two numbers of type `Nat` and `Float`. Optionally, two names have been assigned to these numbers: `count` and `probability`. Curly braces represents un-ordered sets of objects. The inferred type here is the type of all types: `Type`.
let Likelyhood := { count: Nat, probability: Float }
// The above definition is semantically equivalent to this definition:
let Likelyhood := { probability: Float, count: Num }
// But the above two expressions are semantically different from this: (Brackets represent an ordered set, i.e. a list)
let LikelyhoodOrdered := [ count: Num, probability: Float ]

# If you are used to haskell-based currying the first two definitions of `Likelyhood` would be similar to providing two constructors for the same object like the following:
Likelyhood_a : Num -> Float -> Likelyhood
Likelyhood_b : Float -> Num -> Likelyhood
# But the 3rd definition would only be equivalent to Likelyhood_a
fn main() {
// type constructors can be defined like this, with an implicit `->` between `{A:Type}` and `{ x:A, ... }` for structure types.
let Vector3 := { A : Type } { x : A, y : A, z : A } -> { x, y, z }
// or like this using the `Struct` constructor.
let Vector3 := { A : Type } -> Struct { x : A, y : A, z : A }

// type constructors may also be defined for enum types.
let Option := { A : Type } -> Type;
let Some : {A : Type} { a : A } -> Option(A);
let None : {A : Type} -> Option(A);
// or like this using the `Enum` constructor.
let Option := { A : Type } -> Enum {
	Some { a : A },

// Terms can also be named and typed, but left undefined (like how functions in rust traits don't need to be defined immediately)
type add_one : Num -> Num

// This is defining add_one in accordance with its previous defined type. It would be an error to define add_one twice with different implementations in the same context, or for the definition to fail to typecheck with the defined type.
let add_one { a } -> a + 1 

Okay now that we have pseudocode out of the way, we can get to defining an endofunctor :D

In this language, Endofunctors can be defined as a dependent collection of two functions: obj_map: A -> F<A> which maps the object of the category, and a function func_map that takes an arbitrary function A -> B where B can be anything and returns a function F<A> -> F<B>, mapping the morphisms from the initial object. When this definition is used as a "type class" (think trait or interface), it allows the user to abstract over a specific endofunctor definition and refer to all endofunctors as a collective.

fn main() {
// An Endofunctor is a class of type functions that takes a type constructor F of type `Type -> Type` (i.e. a type constructor / wrapper). Classes are essentially partially defined functions where the return value (implementation) is only valid if it has been implemented. 
let Endofunctor := { F : Type -> Type } -> {
	// Specifies how the endofunctor maps objects of the category `Type` to other objects in `Type` using the type constructor `F`
	type obj_map : { A: Type } -> { obj : A } -> F(A);
	// Specifies how the endofunctor maps functions between objects in the category `Type` to other functions in the "extended" F(Type) category.
	type func_map : { A : Type, B : Type} { func : A -> B } -> F(A) -> F(B);

Examples of various type constructors and specifications (implementations) of the Endofunctor class.

fn main() {
// Option is a generic enum, i.e. a function of type `Type -> Enum`
let Option := { T : Type } -> Enum {
// Defines the Endofunctor "function" for a specific type constructor.
let impl_option_endofunctor : Endofunctor(Option) := {
	let obj_map := {A} Option{A}::Some
	// Returns Option{A} -> Option{B} from `func : A -> B`
	let func_map := {A, B} {func} -> (
		{ fa : F(A) } -> match fa {
			Some[t] => Some(func(t))
			None => None
fn main() {
// List is a generic List, i.e. a function from Type -> Enum.
let List { T : Type } -> Enum {
	Cons { list : List T, value : T }, // Recursion defined implicitly
// Defines a term of type Endofunctor(List).
let impl_list_endofunctor : Endofunctor(List) := {
	// Creates a constructor that takes a `value : T` by partially applying the List::Cons variant.
	let obj_map := {A} List{A}::Cons { list: List{A}::Nil }
	let func_map := {A, B} {func} -> ({fa : F(A)} -> match fa {
		Cons{list, value} => Cons(func_map{A,B}{func}{list}, func(value)) // func_map is recursively defined for list
		Nil => Nil
fn main() {
// Generic Identity Function Constructor, we need this to define unit for the Monad.
let identity : [T : Type] [T] T := [T] [v] v

NOTICE: New Syntax / Convention, identity is the name assigned to the expression [T] [v] v. But that expression has also been associated with a type ([T : Type] [T] T). The type of identity is automatically assigned the name (translated from snake_case to PascalCase): ^Identity.

There is analogous syntax for finding a value defined for a given type. This may not always resolve because multiple values could be defined for a given type.

Example: to refer to impl_option_endofunctor, you can instead infer the definition via: _ : Endofunctor(Option).

This also works when defining "implementations". You can use _ to not define a name, and instead let type inference do its job.

fn main() {
let _ : Endofunctor(Identity) := {
	// object map returns the identity function on `A` (i.e. A -> A).
	let obj_map := {A} identity(A)
	// func_map returns the function as-is because identity does nothing.
	let func_map := {A, B} {func} -> func;

// The above is somewhat equivalent this in rust :)
impl Endofunctor for Identity {
	fn obj_map<A>() -> impl Fn(A) -> A { |a| a }
	fn map<A, B>(func : impl Fn(A) -> B) -> { func }

OKAY, Now we can get back to defining the Monad! According to category theory, a monad in a category is a monoid in the category of endofunctors of some other category. We know what endofunctors are, what what is a Monoid?

Wikipedia describes a monoid (the algebraic version) as an set () equipped with a binary operation we will call "multiplication" () and particular member of the set called the "unit".

To translate this into a typed programming language, we will define a monoid as a type class (a.k.a. trait / interface) that can be implemented for objects. The definition of the type class is parameterized on the set and includes a set of two morphisms, unit and multiplication as well as the two monoid laws of associativity and identity: , and

fn main() {
let Monoid := [ M : Type ] {
	type unit : M,
	type multiplication : [a : M, b : M] -> M
	type associativity_proof : { a : M, b : M, c : M } multiplication[multiplication[a, b], c] = multiplication[a, multiplication[b, c]]
	type identity_proof : { a : M } multiplication(a, unit) = multiplication(unit, a)

Now that we have a monoid, we can talk about types that are monoids! (Assume Nat is the type of natural numbers)

fn main() {
// The natural numbers under addition form a monoid
let naturals_addition_monoid : Monoid(Nat) := {
	let unit := 0,
	let multiplication := add,
	// The actual type theory proofs will be left because I haven't figured out how they work yet lol.
	// addition is associative
	let associativity_proof := { a, b, c } <proof of associativity> 
	// 0 + anything = anything + 0
	let identity_proof := { a } <proof of identity>

Alright lets do another one! This time on a type we've actually seen before (I'll add type annotations for unit and multiplication so its more clear what is going on here)

fn main() {
// This is a generic monoid definition for all possible Option types. i.e. `impl<A> Monoid for Option<A>`.
let impl_option_monoid : {A : Type} Monoid(Option(A)) {
	// The `unit` must satisfy the identity law with respect to `multiplication`
	let unit : Option(A) := Option(A)::None
	// The combination is essentially `Option::or`, it combines two Options together and returns the first if it contains a value, or the second.
	let multiplication := [a, b] -> match a {
		Some(a) => Some(a),
		None => b
	} : [Option(A), Option(A)] -> Option(A)
	let associativity_proof := { a , b , c } <proof of associativity>
	let identity_proof := { a } <proof of identity>

Oh wait, since Option is a endofunctor, isn't this just a Monad? (A monad is a monoid defined on an endofunctor after all) Lets try and pinpoint a more exact definition of a Monad :)

fn main() {
let Monad [ M : Type -> Type, M_is_functor : Endofunctor(A) ] {
	unit : M,
	multiplication : [a : M, b : M] -> M
	associativity_proof : { a : M, b : M, c : M } multiplication[multiplication[a, b], c] = multiplication[a, multiplication[b, c]]
	identity_proof : { a : M } multiplication(a, unit) = multiplication(unit, a)

If you followed all that, You should now have a good understanding of what a Monad is! (Specifying it is essentially the same as a monoid above)

How to do Node Discovery while Limiting Structural Data Leakage?

Debate: When requesting for peers from a connected node, should the node send back a list of publically available IPs? Or should the node forward the request to the peers in question for them to initiate the connection?

  • Sending back a list of public IPs
    • Pros:
      • Faster to implement (kinda)
      • Used by most networks
      • Possibly easier NAT tunneling? (because it doesn't inherently require an open listening port)
    • Cons:
      • Allows for easy enumeration of all nodes in the network by just requesting lists and connecting to nodes.
  • Notifying peers of the request for them to connect.
    • Does this strategy actually prevent peer enumeration?
      • People with a single computer can just request for a given node and ask for new connections to get IPs. If they repeatedly do this from different IPs, they can get a good sense of all the peers of a given node.
        • Fix: Instead of just forwarding request to all peers. Forward request to one peer at a time and have that peer report back the measured latency. Forward request to another peer if first peer reported greater latency than request receiver. If peer was in fact closer to requester, that peer connects to the requester and the process starts anew.
      • Even if the peer selection is random and only ~2-3 peers will be contacted at any given peer request, you can still repeatedly ask the node to eventually get a list of all peers.
        • Fix: Request is only forwarded to peers of a similar latency to the requesting node. This also likely makes for quicker convergence to a "local group" as well as prevents closeby peers from needlessly connecting to requesters.
        • Also: any node after the initial requested node can use network coordinates to more accurately recommend closer and closer nodes to forward the peer request to.
      • If someone can place nodes relatively close to a given node (<30ms), they can do peer requests at various distances and eventually sus out all the peers.
        • One way to lessen this attack would be for some peers to simply not participate in the discovery process. Its not a process that needs a whole lot of participants anyway. These peers could initialize connections only if arbitrary (user-defined based on threat profile) conditions are met such as:
          • Predefined trust between the requester and the peer (via any kind of trust-system like web of trust, trusted certificates, or simply a set of trusted public keys)
          • A lighter requirement might be that the node must be connected to some subset of their peers.
          • A third requirement could be that the node must be within a certain latency radius as predicted by routing coordinates.
        • No matter how many times unknown IPs request for peers from a given node, it will never forward the request to these peers.
        • Another way to lessen this kind of attack might be to require new nodes to have knowledge of multiple bootstrap nodes before new peers may be requested. These bootstrap nodes could then collude to decide which of their peers is closest to the requesting node and forward the request on to them (passing along all the latency measurements and routing coordinate estimate data for even more accurate positioning).
      • At this point, I think only an ISP would be able to figure out peer relationships via faking nodes and doing peer requests, and it would probably be easier for them to just do direct traffic analysis...
    • Pros
      • Provides wayy more protection from even relatively advanced adversaries trying to probe the structure of the network.
    • Cons
      • wayyy more complex
      • Slower for new nodes to bootstrap depending on design (although its likely fine once they already have established connections to nearby peers), could also potentially be speed up by web-of-trust stuff (i.e. bootstrapping off multiple friends).