Hash Functions are mathematical algorithms that perform a one-way conversion of an arbitrary number of bytes of data into a byte array of a fixed size. The output is called a “hash” or “hash value”, and is likened to a fingerprint of the original data. A common example of how this process manifests is displayed in the below example, wherein two distinct words are run through a hashing algorithm (in this case, an algorithm called MD5) producing different hash outputs of the same fixed size:
md5("foo") -> acbd18db4cc2f85cedef654fccc4a4d8
md5("bar") -> 37b51d194a7513e45b56f6524f2d51f2
Hashing algorithms are a critical component for numerous information security applications; they are used to sign digital certificates, create message authentication codes (MACs), hash passwords and other authentication cases.
Strong hash functions possess a range of properties:
- Any minor change to the data input, even if the change constitutes only 1 byte, will result in an uncorrelated hash value; this is known as an “avalanche effect”.
- Pre-image Resistant: it should be computationally difficult to reverse a hash to its pre-hashed form.
- Second Pre-Image Resistant: it should be difficult for an attacker to find a different input with the same hash given an input and a hash.
- Collision Resistant: it should be difficult for an attacker to identify two different inputs of arbitrary length that result in identical hashes. Note: difficult does not mean impossible - every hashing algorithm permits collisions… the goal is to make this as a remote reality as possible!
The above properties remain constant as the foundations of strong hash functions; however, a mixture of exponential computing power, and the perennial research for weaknesses in algorithmic construction, combine to repeatedly deprecate once standard algorithms. Weak hashing algorithms are those that have been proven to be of high risk, or even completely broken, and thus are not fit for use.
Additionally, it is important to observe that even strong hashing algorithms may not be suitable to hash passwords, in fact, password hashing algorithms have additional requirements:
- Computing the hash must be computationally intense to avoid brute force and dictionary attacks.
- Data to be hashed should incorporate a salt so, even if the input is the same, the algorithm will produce a different hash. This protects against Rainbow Table attacks.
The impact of successful attacks on weak hashing algorithms can be disastrous, limited only by the value of data, and the imagination of the attacker in leveraging said data. There are countless examples of devastating data breaches exemplifying the fallout from poor hashing algorithm choice. For example, in 2016 (a full two years after the fact) Yahoo! announced they had been the victim of a gargantuan breach, the data of which constituted over 500 million Yahoo! accounts, with account details including; DOBs, unencrypted security questions and answers, and hashed passwords. Had the passwords been hashed by a strong and up to date hashing algorithm, they may have remained worthless data to the attacker. However, the algorithm used was a known weak hashing algorithm - MD5.
Collisions play a central role in a hashing algorithm’s usefulness; the easier it is to orchestrate a collision, the less useful the hash. If an attacker is able to manufacture two distinct inputs that will result in an identical hash value, they are exploiting collision resistance weakness.
In 2005, a famous research paper was published describing an algorithm capable of identifying two different sequences of 128 bytes producing the exact same MD5 hash. The below pair of inputs are commonly used to illustrate this phenomenon:
d131dd02c5e6eec4693d9a0698aff95c2fcab58712467eab4004583eb8fb7f89 55ad340609f4b30283e488832571415a085125e8f7cdc99fd91dbdf280373c5b d8823e3156348f5bae6dacd436c919c6dd53e2b487da03fd02396306d248cda0 e99f33420f577ee8ce54b67080a80d1ec69821bcb6a8839396f9652b6ff72a70
d131dd02c5e6eec4693d9a0698aff95c2fcab50712467eab4004583eb8fb7f89 55ad340609f4b30283e4888325f1415a085125e8f7cdc99fd91dbd7280373c5b d8823e3156348f5bae6dacd436c919c6dd53e23487da03fd02396306d248cda0 e99f33420f577ee8ce54b67080280d1ec69821bcb6a8839396f965ab6ff72a70
An observant reader will note that there are six different characters between the two blocks; however, each block has an MD5 hash of:
Given their critical function as a security enabler for numerous online functions, it is imperative for organizations and individuals responsible for implementing functions utilizing hashing algorithms to ensure they up to speed with the latest iteration. This is, in reality, the only solution to the problem that weak hashing algorithm use represents.
Previously standard hashing functions that are nowadays considered very high risk or completely broken include:
- MD5: known to be susceptible to collision attacks since the mid-‘90s, and considered completely broken.
- SHA-1: considered insecure against well-resourced adversaries since 2005 and formally deprecated for use by NIST in 2011.
- RIPEMD & RIPEMD-128: deemed insecure, with a reported collision occurring in 2004.
- Whirlpool: a rebound attack presented collisions in 2009.
Current hash functions deemed robust and accepted as standard include:
- RIPEMD-160/256/320: multiple variants with differing levels of security, although all considered robust.
- BLAKE2/3: purportedly faster than SHA-1/2/3 and immune to length extension.
- SHA-2: all variants publicly resistant to collision attacks, and most variants resistant to length extension attacks.
- SHA-3: the most recent iteration of the SHA series; publicly resistant to collision and length extension attacks.
When the hash function is used to hash passwords, consider the use of more suitable algorithms, such as:
- Bcrypt: the default password hash algorithm used in many systems.
- Argon2: the winner of the 2015 Password Hashing Competition; the computational intensiveness of the process can be fine-tuned.
- PBKDF2: a key derivarion algorithm recommended by NIST.
In any case, make sure to use an appropriate work factor, i.e., a high enough iteration count.
Verify that known weak hashing algorithms (i.e. MD5, SHA1, etc.) are not used unless required for backwards compatibility.