Distributed ID Generation Solutions

Using auto-increment primary keys from traditional databases as ID for distributed systems can be inefficient and vulnerable to being predicted and analyzed.

Nowadays, distributed systems require unique global identifiers; it is an essential task in distributed computing with growing internet usage.

Requirement #

Functional Requirement #

id should be globally unique
low latency
each node can generate id independently

Nice to have features #

monotonicity (sortable by time)
unpredictable
high availability. Since an ID generator is a mission-critical system, it must be highly available.

Overview #

Designing a distributed ID generation scheme can be divided into two steps

Generate the binary ID, usually selected from pseudo-random numbers, time, and node ID.
Convert the binary ID into a human-readable and easily transmittable string text.

Binary ID Generation #

1. PRNG (pseudo random number generator) #

UUID v4 (128 bit)
Nano ID (no standard length)

	random number (bits)	total (bits)
UUID v4	122	128
Nano ID	126	126

UUID [2] #

UUIDv4 consists of 128 bits, which are typically represented as 32 hexadecimal characters in the following format: xxxxxxxx-xxxx-4xxx-yxxx-xxxxxxxxxxxx, where the digit "4" in the third group signifies the version (i.e., UUIDv4) and the digit "y" in the fourth group specifies the variant.

Pros #

Generating UUID is simple. No coordination between servers is needed so there will not be any synchronization issues.
The system is easy to scale because each web server is responsible for generating IDs they consume. ID generator can easily scale with web servers.

Cons #

IDs are 128 bits long, but our requirement is 64 bits.
IDs do not go up with time.
IDs could be non-numeric.

Chance of duplicate ID

With the sheer number of possible combinations (2^128), it would be almost impossible to generate a duplicate unless you are generating trillions of IDs every second, for many years.

Nano ID [3] #

A tiny, secure, URL-friendly, unique string ID generator. Small. 130 bytes (minified and gzipped). No dependencies. Size Limit controls the size. Safe. It uses hardware random generator. Can be used in clusters. Short IDs. It uses a larger alphabet than UUID (A-Za-z0-9_-). So ID size was reduced from 36 to 21 symbols. Portable. Nano ID was ported to 20 programming languages.

2. Timestamp + PRNG #

	timestamp (bits)	random number (bits)	total (bits)
ULID	48	80	128
KSUID	32	128	160

ULID [4] #

128-bit compatibility with UUID
Lexicographically sortable!
- timestamp: UNIX-time in milliseconds
No special characters (URL safe)
Monotonic sort order (correctly detects and handles the same millisecond)

KSUID [4] #

There are numerous methods for generating unique identifiers, so why KSUID?

Naturally ordered by generation time

Collision-free, coordination-free, dependency-free

Highly portable representations

3. Timestamp + Node ID + Monotonic Counter #

	timestamp (bits)	Node Id (bits)	Process Id (bits)	Counter (bits)	total (bits)
Snowflake ID	41	10	-	12	64
MongoDB ObjectID	32	24	16	24	96

id is unique across different node and process
counter ensure the uniqueness on same process of same node

Security Concern #

since the structure of id, the entropy of id is quite small, which means easy to predict.
It is dangerous through IDOR

Snowflake ID [6] #

timestamp: Epoch in milliseconds precision
- That gives us 69 years with respect to a custom epoch.
Node ID: 10 bits, this gives us 1024 nodes/machines.
Local counter: 12 bits, The counter’s max value would be 4095.

MongoDB ObjectID [7] #

The object ID is generated by the MongoDB driver instead of the database.
The 12-byte ObjectId consists of:
- A 4-byte timestamp, representing the ObjectId's creation, measured in seconds since the Unix epoch.
- A 5-byte random value generated once per process. This random value is unique to the machine and process.
- A 3-byte incrementing counter, initialized to a random value.