Security

Cryptographic Hash Functions: SHA-256, MD5, and When Each Applies

February 19, 2025 · Faizzyhon · 11 min read

Hash functions are among the most widely misused tools in software security. Understanding the difference between a general-purpose hash and a cryptographic hash — and knowing which algorithm fits which job — prevents serious vulnerabilities.

Hash functions appear in nearly every layer of modern software — file integrity checking, digital signatures, password storage, blockchain ledgers, data deduplication, content-addressable storage, and API authentication. Despite their ubiquity, the word "hash" is used to describe tools with vastly different security properties, and choosing the wrong one for a given purpose can introduce serious vulnerabilities. This guide explains what cryptographic hash functions are, how they work, which algorithms are appropriate for which tasks, and how to use them correctly in real applications.

Cryptographic hash function visualization

What Makes a Hash Function Cryptographic

A hash function takes an input of arbitrary length and produces a fixed-length output called a digest or hash. A non-cryptographic hash function — like those used in hash tables — is designed purely for speed and uniform distribution, with no security guarantees. A cryptographic hash function must satisfy three additional properties: preimage resistance, second preimage resistance, and collision resistance.

Preimage resistance means it is computationally infeasible to find an input that produces a given hash output — even if you know the hash. Second preimage resistance means given a specific input and its hash, it is infeasible to find a different input that produces the same hash. Collision resistance means it is infeasible to find any two distinct inputs that produce the same hash output. These three properties are listed in order of increasing strength — collision resistance is harder to achieve and easier to break than preimage resistance.

The word "infeasible" here has a precise meaning: computationally infeasible given current and foreseeable computing resources. A 256-bit output hash has 2^256 possible values — approximately 1.16 × 10^77. Finding a collision by brute force requires testing approximately 2^128 inputs (by the birthday paradox). At one trillion attempts per second, this would take longer than the current age of the universe. This is why SHA-256 is considered secure: even if an algorithm flaw were discovered that halved the effective security, the remaining difficulty would still be astronomical.

MD5 — Historically Important, Now Broken

MD5 produces a 128-bit (16-byte) hash. It was designed in 1991 and was the dominant cryptographic hash function through the 1990s. In 1996, a weakness in MD5 was published. By 2004, researchers had demonstrated full collision attacks — finding two different inputs with the same MD5 hash — in under an hour on contemporary hardware. By 2010, MD5 collisions could be constructed in seconds.

Today, MD5 is cryptographically broken for any security-sensitive purpose. Specifically: MD5 should not be used for digital signatures, certificate fingerprinting, integrity verification of security-sensitive files, or anything requiring collision resistance. A well-documented real-world incident in 2012 demonstrated this: an MD5 collision vulnerability in a major software update signing infrastructure was exploited to create fraudulent certificates that appeared valid to client systems — illustrating exactly why deprecated hash functions must be replaced, not merely discouraged.

MD5 still has legitimate uses in non-security contexts. Checksumming large file downloads for accidental corruption detection (not malicious modification) is fine — an adversary who can modify the file can also modify the checksum, but random bit errors won't produce matching checksums. Content-addressable caching, database record fingerprinting for change detection, and deduplication hashing in non-adversarial contexts are all acceptable MD5 uses. The key rule: if an attacker could benefit from a collision, don't use MD5.

SHA-1 — Deprecated but Still Present

SHA-1 produces a 160-bit hash. It was the successor to MD5 and was required by many security standards through the 2000s. In 2017, Google's Project Zero demonstrated SHAttered — the first practical SHA-1 collision, achieved using approximately 6,500 CPU-years of computing — and produced two different PDF files with identical SHA-1 hashes. The attack cost approximately $110,000 of cloud computing time.

SHA-1 is deprecated in TLS certificates, code signing, Git (which is migrating away from SHA-1 for security-sensitive operations), and virtually all modern security standards. Legacy systems still using SHA-1 present measurable risk. Audit your cryptographic inventory and replace SHA-1 with SHA-256 wherever it appears in a security context. The migration is typically a one-line code change.

SHA-2: The Current Standard

SHA-2 is a family of hash functions defined in NIST FIPS 180-4: SHA-224, SHA-256, SHA-384, SHA-512, SHA-512/224, and SHA-512/256. The number indicates the output size in bits. SHA-256 (256-bit output) and SHA-512 (512-bit output) are the most widely used.

SHA-256 is the default choice for almost all modern security applications: TLS certificate fingerprinting, code signing, HMAC authentication, blockchain (Bitcoin, Ethereum use SHA-256 and Keccak respectively), password manager vault integrity checking, and API request signing. It offers a 128-bit security level against collision attacks — well above any realistic attack threshold.

SHA-512 provides a 256-bit security level and is faster than SHA-256 on 64-bit systems due to operating on 64-bit words natively. On 32-bit systems it is slower. For most applications, SHA-256 is the correct choice. Use SHA-512 when you need higher security margins (key derivation, long-term archival integrity) or when you're on a 64-bit server processing large volumes of data where the performance advantage matters.

SHA-3: The Alternative Standard

SHA-3 (Keccak) was standardised by NIST in 2015 as an alternative to SHA-2 — not a replacement, but an independent backup. SHA-2 and SHA-3 are based on fundamentally different mathematical constructions. If a serious vulnerability were discovered in SHA-2's Merkle–Damgård construction, SHA-3's sponge construction would provide an unaffected alternative. SHA-3 is used in Ethereum's smart contract environment and some newer cryptographic protocols but has not displaced SHA-2 in mainstream applications.

HMAC: Authenticated Hashing

HMAC (Hash-based Message Authentication Code) combines a hash function with a secret key to produce an authentication tag. HMAC-SHA256 is the most common variant. It computes the hash of the message combined with the key in a specific nested structure that is secure against length-extension attacks — a vulnerability in raw SHA-256 when used as a MAC directly.

HMAC is the foundation of API request signing. AWS Signature Version 4, Stripe webhook verification, GitHub webhook payloads, and JWT HS256 tokens all use HMAC. The sender computes HMAC-SHA256(secret_key, message) and includes the result in the request. The receiver computes the same HMAC independently and compares — if they match, the request is authenticated and unmodified.

When comparing HMACs for authentication, always use a constant-time comparison function. A naive string equality check may return early as soon as it finds a mismatching byte, leaking timing information that can be exploited in a timing attack to reconstruct the expected HMAC one byte at a time. Most security libraries provide a constant-time comparison function; use it for any security-sensitive string comparison.

Password Hashing Is Different

General-purpose cryptographic hash functions like SHA-256 are explicitly not appropriate for hashing passwords. The properties that make them good for integrity checking — speed and determinism — make them bad for password storage. A modern GPU can compute billions of SHA-256 hashes per second, making brute-force and dictionary attacks against a database of SHA-256-hashed passwords extremely fast.

Password hashing algorithms — bcrypt, scrypt, Argon2 — are designed to be intentionally slow. They incorporate a cost factor (work factor, or iteration count) that controls how much computation is required to hash a single password. As hardware gets faster, you increase the cost factor to maintain the same real-world time per hash. Argon2id is the current recommended choice: it is the winner of the Password Hashing Competition, is resistant to both GPU attacks (due to memory hardness) and side-channel attacks (the -id variant blends the data-dependent Argon2d and data-independent Argon2i approaches).

Our free Hash Generator tool computes SHA-256 and other digests for text input, using the browser's native Web Crypto API with no server involvement. This is useful for verifying checksums, debugging HMAC implementations, generating content fingerprints, and learning how hash functions behave with different inputs. For password hashing, use bcrypt or Argon2 server-side — never in the browser.

The Web Crypto API

Modern browsers expose a cryptographically secure hash API through window.crypto.subtle. The SubtleCrypto.digest() method accepts an algorithm name ("SHA-256", "SHA-384", "SHA-512") and an ArrayBuffer of data, and returns a promise resolving to the hash digest as an ArrayBuffer. Converting this to a hex string requires iterating the bytes and formatting each as a two-character hex sequence.

The Web Crypto API is available in all modern browsers and Node.js 15+ (as globalThis.crypto.subtle). For older Node.js versions, the built-in crypto module provides createHash('sha256').update(data).digest('hex') which is synchronous and simpler for server-side use. Never implement SHA-256 or other cryptographic algorithms from scratch — use platform-provided implementations that have been audited and optimised.