Tools

UUIDs in Modern Software: Generation, Collision Probability, and Best Practices

February 15, 2025 · Faizzyhon · 10 min read

UUIDs are the standard way to generate unique identifiers without coordination between distributed systems. Here's everything you need to know about UUID v4, collision math, and when to use them versus alternatives.

Unique identifier generation is a deceptively deep problem in distributed systems. In a single-process application with a single database, an auto-incrementing integer is perfectly adequate — the database guarantees uniqueness, and the integer is compact, fast to index, and easy to read. But the moment you introduce multiple servers, client-side generation, database sharding, or cross-system data merging, you need identifiers that can be generated independently on any system without coordination — and that's where UUIDs come in.

What Is a UUID?

A UUID (Universally Unique Identifier) is a 128-bit number, typically represented as a string of 32 hexadecimal digits grouped by hyphens into the pattern xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx — for example, 550e8400-e29b-41d4-a716-446655440000. The standard is defined by RFC 4122. The name "universally unique" reflects the design goal: UUIDs should be unique across all systems, at all times, without requiring central coordination.

The 128-bit size is a deliberate engineering choice. 128 bits provides approximately 3.4 × 10^38 possible values. To put this in perspective, if you generated one billion UUIDs per second for the entire age of the universe (13.8 billion years), you would have produced approximately 4.35 × 10^26 UUIDs — a number roughly 10^12 times smaller than the total UUID space. The collision probability is, for all practical purposes, zero.

UUIDs come in five versions defined by RFC 4122, each using a different generation strategy. Version 1 uses the MAC address of the generating machine combined with a timestamp. Version 2 is a variant of v1 used in DCE security. Version 3 and 5 generate UUIDs deterministically from a namespace and a name using MD5 and SHA-1 respectively. Version 4 uses random or pseudorandom bits and is by far the most commonly used version for general-purpose unique ID generation.

UUID v4 Generation

A UUID v4 is 128 bits where 122 bits are randomly generated and 6 bits are fixed as version and variant markers. The version nibble (bits 12–15 of the third group) is always 4, indicating version 4. The variant bits (bits 6–7 of the fourth group) are always set to 10 in binary, indicating the RFC 4122 variant. This is why the third group always starts with 4 and the fourth group always starts with 8, 9, a, or b in a valid v4 UUID.

The randomness source is critical. A UUID v4 generated from a cryptographically secure random number generator (CSPRNG) is safe for security-sensitive purposes. A UUID v4 generated from a weak pseudorandom number generator (PRNG) seeded predictably — for example, with the current timestamp — could potentially be predicted by an attacker who knows the seed. For UUIDs used as session tokens, API keys, or resource identifiers in security-sensitive contexts, always use a CSPRNG.

In the browser, the CSPRNG is available via crypto.getRandomValues(). The built-in crypto.randomUUID() method (available in all modern browsers and Node.js 15.6+) generates a properly formed UUID v4 using the CSPRNG in a single call. For older environments, the standard pattern fills a Uint8Array with random bytes using crypto.getRandomValues() and then formats them into the UUID string representation.

Collision Probability in Practice

UUID v4 has 122 random bits, giving 2^122 ≈ 5.3 × 10^36 possible values. The birthday paradox tells us the expected number of UUIDs you need to generate before the first collision has probability 50% is approximately 2^61 ≈ 2.3 × 10^18. At a generation rate of one billion UUIDs per second, you would need to generate continuously for approximately 73 years to reach a 50% collision probability.

For any realistic application — even large-scale distributed systems generating millions of UUIDs per day — the probability of a collision is so small that it is not a meaningful engineering concern. Applications that need formal collision guarantees (where even a 10^-30 probability matters) typically use a hybrid approach: a UUID component combined with a database-level unique constraint that will catch any collision if one ever occurs.

The collision analysis assumes a genuinely random source. If your random number generator has biases, correlations, or is seeded from a low-entropy source, the effective random bits may be far fewer than 122, dramatically increasing collision probability. This is not a theoretical concern — there have been real incidents where SSL certificate serial numbers, session tokens, and other "random" values exhibited predictable patterns because of poor entropy sources, particularly on virtualised or freshly booted systems.

UUID v4 vs Auto-Increment vs Other Identifiers

Auto-increment integers have clear advantages: they're compact (4 or 8 bytes vs 16 bytes for UUID), they sort naturally by creation order, they're human-readable, and they're fast to index in most database engines. They have equally clear disadvantages: they require a central authority (the database) to generate them, they reveal information about creation order and volume, they can't be generated client-side, and they fail across multiple databases without additional coordination.

UUIDs solve the coordination problem perfectly but introduce index performance concerns. In databases that use B-tree indexes (virtually all relational databases), randomly distributed UUID v4 values cause page fragmentation in the primary key index over time, leading to poor write performance and increased storage at scale. This is a real issue at tens of millions of rows or higher.

UUID v7 (a newer version not yet in the RFC 4122 standard but widely supported in libraries) addresses the index performance issue by using a time-ordered prefix followed by random bits. This combines the sortability of auto-increment IDs with the distributed generation capability of UUID v4. For new applications that need UUID semantics and are concerned about database performance at scale, UUID v7 is worth evaluating. For existing applications, the performance impact of UUID v4 typically doesn't justify a migration until it becomes measurably problematic.

Database identifier strategies comparison

UUID Design Patterns

When using UUIDs as database primary keys, store them as a native UUID type if your database supports it (PostgreSQL's uuid type, MySQL's CHAR(36) or binary BINARY(16)). Storing UUIDs as VARCHAR is convenient but wastes space — a UUID as a string is 36 characters (32 hex digits + 4 hyphens); as binary, it's 16 bytes — a 55% size reduction that compounds significantly at scale.

For UUIDs exposed in URLs or APIs, the standard hyphenated format is universally recognised. For contexts where hyphenated UUIDs are inconvenient — QR codes, short URLs, command-line arguments — UUID can be encoded in Base64url (22 characters without padding) or Base58 (22 characters) for a more compact but equivalent representation.

Never use UUIDs as passwords, API keys, or secrets without additional hardening. While UUID v4 has good entropy, 122 bits is on the lower end for a standalone secret. API keys and session tokens typically use 160–256 bits of random data (a crypto.randomBytes(32).toString('hex') gives 256 bits of entropy). Use UUIDs for identifiers and proper CSPRNG-generated byte arrays for secrets.

Generating UUIDs in JavaScript

The modern, correct approach in any environment that supports it: const id = crypto.randomUUID(). This is available in browsers (Chrome 92+, Firefox 95+, Safari 15.4+) and Node.js 15.6+. For environments that require compatibility with older runtimes, the standard polyfill pattern: fill 16 random bytes with crypto.getRandomValues(new Uint8Array(16)), then format them as a UUID string, setting the version nibble to 4 and the variant bits to 10.

Our free UUID v4 Generator tool produces cryptographically secure UUIDs directly in your browser using crypto.randomUUID(), with no data transmitted to any server. You can generate up to 100 at a time — useful for populating test fixtures, seeding development databases, generating batch API request IDs, or any other scenario requiring multiple unique identifiers at once.

UUIDs in Distributed System Design

The primary architectural value of UUIDs is enabling entities to be created and referenced before they are persisted. In a microservices architecture, a client can generate a UUID for a new order before sending it to the backend. The order service stores the UUID as the primary key. Other services reference the order by that UUID without needing to wait for the database to generate and return an auto-increment ID. This eliminates a class of distributed systems complexity around ID assignment and propagation.

UUIDs also simplify database migration and data merging. When merging two databases — during acquisitions, data imports, or database consolidation — UUID primary keys will never collide, whereas integer IDs almost certainly will. The UUID-first approach treats each data record as uniquely identifiable regardless of which system generated it, which is a valuable invariant in any system that integrates data across boundaries.