Hashing: Verify Data Integrity, Detect Corruption, and Identify Tampering
Data is constantly transferred, copied, downloaded, and stored across countless devices and networks. Ensuring your files remain intact and unaltered is essential. Whether verifying a critical software download, backing up personal files, or managing long-term archives, hashing provides a reliable solution.
Hashing is a simple yet powerful technique used to verify data integrity, detect corruption, and identify unauthorised changes.
This guide explains what hashing is, how it works, its limitations, and how to apply it confidently in everyday workflows. From basic hash checks to secure verification using GPG signatures, you’ll gain practical knowledge to safeguard your data with clarity and precision.
Overview #
What It Is #
A hash is a short, fixed-size string generated from a file’s contents using a mathematical algorithm called a hash function. It acts like a digital fingerprint for the file. If even a single bit in the file changes, the hash will (with high probability) also change.
Hashing is widely used to:
- Detect file corruption (e.g. due to disk failure or network transmission errors)
- Verify integrity after copying, downloading, or backing up
- Support reproducibility in software, research, and builds
- Compare versions of files quickly and reliably
Hashes do not provide security by themselves. They cannot:
- Prove authenticity
- Detect intentional tampering (unless combined with cryptographic signatures like GPG)
How It Works #
- A hashing algorithm processes the file and produces a hash value.
- You can later recompute the hash and compare it to the original.
- If the values match, the file is unchanged. If they differ, the file is corrupted or altered.
Syntax #
sha256sum [OPTION] <file>
Option:
-c
- Check a file against a previously generated SHA-256 hash.
Example
Create and open a file:
vim file.txt
Add the following content:
Hello, world!
Generate the SHA-256 hash and save it:
sha256sum file.txt > file.txt.sha256
Expected output:
d9014c4624844aa5bac314773d6b689ad467fa4e1d1a50a1b8a99d5a95f72ff5 file.txt
This hash uniquely represents the content of file.txt
. Any modification to the file will result in a different hash.
Verify the hash:
sha256sum -c file.txt.sha256
Expected output:
file.txt: OK
Common Use Cases #
- Verifying downloaded files
When you download a file (such as an ISO image or software package), you can compare its hash to the one provided by the source. If they match, the file is complete and hasn’t been tampered with. - Validating data integrity in backups or deployments
Hashing ensures that files stored in backups or deployed across systems haven’t changed or become corrupted over time. - Ensuring consistent results in automation and builds
In CI pipelines and reproducible builds, hashes confirm that the inputs and outputs are exactly the same every time, helping to detect unexpected changes or errors.
For trust and authenticity, pair hashing with digital signatures (e.g. using gpg
).
Core Concepts #
Key Terminology #
Authenticity
Authenticity confirms that data originates from a legitimate and trusted source. Hashing alone does not guarantee authenticity. To achieve this, hashes must be combined with digital signatures or other cryptographic methods.
Bit Rot
Bit rot refers to the gradual and unintentional corruption of data stored on disk or other media over time. Periodic hash verification allows early detection of bit rot by revealing any unexpected changes in file contents.
Collision
A collision occurs when two different inputs produce the same hash value. This undermines the trustworthiness of the hash function and can be exploited in certain attack scenarios. Collision resistance is essential for secure applications.
Digital Signature
A digital signature is a cryptographic mechanism that proves a hash (or file) was generated or approved by a trusted source. It relies on public-key cryptography:
Signed using a private key
Verified using a public key
Example: GPG-signed hash files
Hash
A hash is a short, fixed-length string of characters (usually hexadecimal) computed from a file’s contents using a hash function. It acts like a digital fingerprint for that file. If even one byte changes, the hash will (with high probability) also change.
Hash Function
A hash function is a mathematical algorithm that takes an input (e.g. file content) and produces a fixed-size output (a hash). For integrity checking, a good hash function should be:
Deterministic: The same input always produces the same output
Fast: Efficient to compute
Collision-resistant: It is computationally infeasible for two different inputs to produce the same output
Example: SHA-256
Hash Value
The hash value is the actual output of the hash function – a fixed-length string used to verify the integrity of the input data.
Example (SHA-256 hash):
d9014c4624844aa5bac314773d6b689ad467fa4e1d1a50a1b8a99d5a95f72ff5
Integrity
Integrity means the data has not been altered either unintentionally (e.g. bit rot) or maliciously. A matching hash confirms that the content is unchanged since the hash was last computed.
Verification
Verification is the process of checking whether a newly computed hash matches a previously known and trusted hash. If they match, the file is considered unaltered.
Let me know if you'd like these formatted as a glossary table or cross-referenced.
Hashing Algorithms #
Algorithm | Output Size | Common Use Cases | Use This If... | Avoid If... |
---|---|---|---|---|
CRC32 | 32 bits | Networking, file system checks | You are detecting accidental corruption in controlled environments | You need cryptographic integrity or tamper detection |
MD5 | 128 bits | Legacy file validation | You are working with legacy systems or protocols | You require strong security or resistance to tampering |
SHA-1 | 160 bits | Deprecated security, Git internals | You are interfacing with Git or old APIs with no alternatives | You need reliable integrity in adversarial conditions |
SHA-256 | 256 bits | Modern integrity and security checks | You need secure, collision-resistant verification | You are in a rare performance-constrained scenario |
SHA-512 | 512 bits | High-assurance integrity verification | You want strong guarantees for small, high-value files | File sizes are large and performance is a concern |
Recommendations
Use SHA-256
Recommended in most cases.
SHA-256 is suitable for most general-purpose and security-sensitive scenarios.
Strong collision resistance
Widely supported across platforms and tools (
sha256sum
,gpg
, TLS, etc.)Efficient on modern 64-bit CPUs
Use SHA-512 for high-security applications
SHA-512 offers higher assurance and longer hash outputs, useful for sensitive or archival data.
Greater resistance to brute-force attacks
Suitable for long-term storage, cryptographic systems, and critical integrity checks
Avoid MD5 and SHA-1 for new security applications
Both have practical collision attacks and are no longer considered secure
Still usable for non-hostile environments where accidental corruption is the only concern
Use CRC32 only for basic error detection
Fast and lightweight
Suitable for file systems, network protocols, and checksums in safe environments
Not suitable for verifying authenticity or resisting tampering
Limitations of Hashing #
Hashing is highly effective for detecting unintended file changes, but it has critical limitations. Understanding these is essential when using hashing in contexts involving security, authenticity, or untrusted environments.
Non-Cryptographic Safety
Basic algorithms like CRC32 and MD5 were designed for error detection or legacy compatibility, not for secure verification. They are vulnerable to intentional manipulation.
- Example: Two different files can be crafted to produce the same MD5 hash (a collision).
- Impact: An attacker can modify a file and still make the hash appear valid.
No Protection from Intentional Tampering
Hash values only reflect content. They do not confirm who created the file or whether it has been maliciously replaced.
- Solution: Use digital signatures (e.g.
gpg
) to validate both the hash file and its origin. - Secure workflow: Only trust hashes if they are cryptographically signed by a known and verified key.
Vulnerable to Collisions
Hash collisions occur when two different inputs generate the same output. This undermines the reliability of older algorithms.
- MD5 and SHA-1 are no longer secure for cryptographic verification.
- Recommendation: Use SHA-256 or stronger algorithms for modern workflows.
No Built-in Authentication or Encryption
Hashing does not:
- Authenticate the identity of the file's creator
- Prevent an attacker from replacing both the file and its associated hash
- Encrypt or conceal file contents
Hashes = Integrity only.
To achieve authenticity and trust, use digital signatures in combination with hashes.
Filesystem and Path Dependencies
Hash verification may fail under certain conditions:
- Files are renamed or moved (depending on how paths are recorded)
- Line endings change across platforms (e.g. Windows vs Linux)
- File metadata (e.g. timestamps, permissions) affects tools that hash more than just file contents
Use simple, consistent filenames and scripting practices. Avoid embedding variable paths in hash lists unless necessary.
Summary
Limitation | Description | Mitigation |
---|---|---|
Not secure on their own | Vulnerable to forgery and spoofing | Use GPG-signed hashes |
Collisions possible | MD5 and SHA-1 can be tricked | Use SHA-256 or SHA-512 |
No authentication | Cannot confirm file origin | Use digital signatures |
No encryption | Does not protect or hide data | Combine with encryption tools |
Path/format sensitivity | File or format changes can break checks | Use stable, scripted workflows |
Always use hashing as part of a broader integrity and security process, not as a standalone solution for authenticity or trust.
Workflow #
Install #
Ubuntu Linux: sha256sum
, md5sum
, and gpg
are included by default as part of the coreutils and gnupg packages. No installation is needed.
macOS (using Homebrew):
brew install coreutils
brew install gnupg
Verify installations
Check GPG keys:
gpg --list-keys
Check versions:
md5sum --version
sha256sum --version
Basic Commands #
1. Generate SHA-256 Hash
Single file (display in terminal):
sha256sum <file>
Expected output:
d9014c4624844aa5bac314773d6b689ad467fa4e1d1a50a1b8a99d5a95f72ff5 <file>
Single file (save to hash file):
sha256sum <file> > <file>.sha256
This creates a .sha256
file containing the hash and filename.
Multiple files (save to combined hash file):
sha256sum <file1> <file2> > SHA256SUMS
Expected output:
6dcd4ce23d88e2ee9568ba546c007c63e231f530f115b4a846de495d3806971f file1.txt
2c6ee24b09816a6f14f95d1698b24cfc3f438178e6a7c1f1cc55cfe547b0208a file2.txt
2. Verify SHA-256 Hash
Single file:
sha256sum -c <file>.sha256
Output:
<file>: OK
If the file has been altered or is missing:
<file>: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
Multiple files:
sha256sum -c SHA256SUMS
Each file will be checked and marked OK
or FAILED
.
Suppress ‘OK’ Messages:
To show only problems:
sha256sum -c --quiet SHA256SUMS
If file paths or names have changed since hashes were generated, verification will fail. Either move the files back or regenerate the hash list.
3. Folder-Wide Hashing (Recursive and Safe)
Generate SHA-256 hashes for all files, excluding unwanted ones:
cd </path/to/files>
find . -type f \
! -path './.git/*' \
! -name '*.sha256' \
! -name '.DS_Store' \
-print0 | sort -z | xargs -0 sha256sum > SHA256SUMS
Explanation:
find . -type f
: finds all regular files! -path
,! -name
: excludes.git
,.sha256
, and.DS_Store
-print0
andxargs -0
: safely handle filenames with spaces or special characterssort -z
: ensures a consistent, reproducible file order
Verify all files in folder:
sha256sum -c SHA256SUMS
4. Sign and Verify Hash File with GPG
Digitally signing the hash file provides:
Authenticity: Confirms the file came from a trusted party.
Tamper protection: Ensures the hash file itself hasn't been altered.
Step 1: Sign the Hash File (Sender)
gpg --armor --detach-sign <file>.sha256
You will be prompted for your GPG passphrase.
This creates a detached ASCII-armoured signature:
<file>.sha256.asc
Step 2: Share with the Recipient
- The original file:
<file>
- The hash file:
<file>.sha256
- The signature file:
<file>.sha256.asc
- Your public GPG key
- Your GPG fingerprint
Export public key to a file:
gpg --armor --export <[email protected]> > publickey.asc
--armor
: Outputs in readable ASCII format (suitable for sharing)[email protected]
: Replace with your GPG key's user ID or emailpublickey.asc
: This file contains your public key
Before sharing your public key, verify your key’s unique fingerprint:
gpg --fingerprint <[email protected]>
This shows a string of characters (the fingerprint) like:
ABCD 1234 EFGH 5678 IJKL 9012 MNOP 3456 QRST 7890 UVWX
Share securely
- Public key (
publickey.asc
): attach to email, upload to a secure server, or host on GitHub - Fingerprint: send separately (e.g. in plain text or README) to allow verification
Step 3: Verify the Signature (Recipient)
The recipient needs your public GPG key to verify the signature:
gpg --verify <file>.sha256.asc <file>.sha256
Expected output:
gpg: Good signature from "Your Name <[email protected]>"
- Confirms the hash file is authentic
- Verifies it has not been tampered with
Step 4: Verify the File Integrity (Recipient)
Once the hash file is trusted, the recipient can verify the actual file:
sha256sum -c <file>.sha256
Expected output:
<file>: OK
Verifying Downloaded ISO Files #
1. Download Hash and Signature Files
From the Ubuntu releases site:
SHA256SUMS
– contains the SHA-256 hashes for available ISO imagesSHA256SUMS.gpg
– a GPG signature for verifying the integrity and authenticity ofSHA256SUMS
Download both from:
https://releases.ubuntu.com/
2. Verify the Authenticity of the Hash File
To ensure the SHA256SUMS
file itself has not been tampered with:
gpg --keyid-format long --verify SHA256SUMS.gpg SHA256SUMS
If you receive the following error:
gpg: Can't check signature: No public key
It means the necessary Ubuntu signing key is not in your keyring.
Check the output for the required key ID. Example:
using RSA key 843938DF228D22F7B3742BC0D94AA3F0EFE21092
Import the missing key from Ubuntu’s key server:
gpg --keyid-format long --keyserver hkp://keyserver.ubuntu.com --recv-keys <key_id>
Expected output:
gpg: key D94AA3F0EFE21092: public key "Ubuntu CD Image Automatic Signing Key (2012) <[email protected]>" imported
gpg: Total number processed: 1
gpg: imported: 1
Verify the key is now in your keyring:
gpg --list-keys --keyid-format long
Re-run the verification:
gpg --keyid-format long --verify SHA256SUMS.gpg SHA256SUMS
Expected output:
gpg: Good signature from "Ubuntu CD Image Automatic Signing Key (2012) <[email protected]>" [unknown]
This confirms that the hash file you downloaded is authentic and has not been tampered with.
The warning about the key not being certified means you haven’t personally marked this key as trusted, but the signature is valid.
3. Verify the ISO Hash
Ensure the ISO file and both SHA256SUMS
files are in the same directory
Run:
sha256sum -c SHA256SUMS 2>&1 | grep OK
Explanation:
sha256sum -c SHA256SUMS
: checks all files listed in the hash file2>&1
: merges stderr with stdout| grep OK
: filters to only show successful matches
Expected output:
ubuntu-24.04-desktop-amd64.iso: OK
If the output does not include : OK
, the ISO may be corrupted or tampered with. In that case, re-download it from a trusted source.
4. Clean Up
Remove the imported key:
gpg --delete-keys <key_id>
If prompted, type y
to confirm key deletion from the keyring.
Confirm the key has been removed:
gpg --list-keys --keyid-format long
Best Practices #
- Use Strong, Modern Hash Algorithms
Prefer SHA-256 or SHA-512 for generating hashes. Avoid MD5 and SHA-1 except for legacy or very low-risk cases due to their vulnerability to collisions. - Always Verify Downloads
Verify the hash of any downloaded file or installer before use. Trust only hashes from secure, ideally GPG-signed, sources. - Automate Hash Verification
Integrate hash verification into installation scripts, deployment processes, or CI/CD pipelines to detect corruption or tampering early. - Store Hashes Alongside Files
Keep hash files together with backups, archives, or datasets to enable verification at any time. - Use Hashes for Integrity, Not Authenticity
Hashes detect accidental changes but do not confirm file origin. Combine hashes with digital signatures (e.g. GPG) to verify authenticity. - Regularly Re-Verify Long-Term Data
Schedule periodic integrity checks on archival or critical data to detect silent corruption or bit rot before data loss occurs. - Be Careful with File Paths in Hash Files
Avoid relative or variable paths in hash lists to prevent false mismatches. Use consistent, simple filenames when generating hash files. - Protect Hash Files
Store hash files in read-only or access-controlled locations to prevent tampering. Sign hash files with GPG to ensure authenticity. - Use Standard Naming Conventions
Adopt clear, consistent names such as SHA256SUMS or<file>.sha256
for easy identification by users and scripts. - Educate Your Team
Make hash generation and verification a standard part of workflows before publishing, deploying, or archiving files. Clear procedures reduce errors and improve security.
By adhering to these best practices, you enhance your data integrity processes and reduce the risk of unnoticed corruption or unauthorised modification.
By adhering to these best practices, you enhance your data integrity processes and reduce the risk of unnoticed corruption or unauthorised modification.
Frequently Asked Questions (FAQ) #
What is the difference between a hash and a digital signature?
A hash verifies data integrity by confirming that the data has not changed or become corrupted. It does not prove who created the data or whether it is authentic.
A digital signature ensures both authenticity and integrity. It uses a private key to sign the data and a public key to verify it, confirming the file’s origin. For secure distribution, use digital signatures alongside hashes.
Should I use hashes for my backups?
Yes. Hashes are a best practice for backups. They ensure files remain intact, uncorrupted, and untampered over time.
Benefits include:
- Detecting silent corruption like bit rot or storage errors
- Verifying complete and correct file copies
- Confirming long-term integrity through periodic checks
- Providing confidence in disaster recovery with exact file matches
Why is MD5 still used if it is broken?
MD5 is fast and easy to implement. It’s acceptable for low-risk, non-security applications like basic integrity checks. However, avoid MD5 for anything requiring trust or authenticity because it is vulnerable to collisions.
How large are hash files?
Hash files are very small, typically only a few kilobytes, even for many files. Each entry contains a hash value and filename, making them efficient to store and share.
Can I trust hash files I find online?
Only if they:
- Come from a trusted source
- Are delivered over HTTPS
- Are signed with GPG and the signature verifies
If any of these are missing, the hash file may be compromised or fake.
Are hashes useful for small files?
Yes. Even small files can suffer corruption or tampering. Hashes are quick to compute and add minimal overhead, so always verify files regardless of size.
Troubleshooting #
Verification Failures #
Symptom:
filename: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
Possible Causes:
- File was altered, corrupted, or truncated during download or transfer
- Wrong version of the file
- Line endings changed (especially on cross-platform transfers)
- The hash file is outdated or not intended for this file
Solutions:
- Re-download the file and the hash file
- Verify file size and date
- Use
diff
to examine differences - Use a tool that preserves binary integrity during transfer (e.g.
rsync
)
Hash File Format Errors #
Symptom:
sha256sum: filename.sha256: no properly formatted checksum lines found
Possible Causes:
- Hash file contains invalid formatting or extra characters
- Copy-pasted from a website without cleaning up
- Incorrect hash type (e.g. using
sha256sum
on an MD5 hash file)
Solutions:
- Ensure the file follows standard format:
<hash> <file>
- Remove extra headers, blank lines, or markdown formatting
- Match the hash algorithm with the tool used (e.g.
md5sum
vssha256sum
)
Path Mismatches #
Symptom:
- Checksum verification fails with "No such file" or "FAILED open or read"
Cause:
- Hash file includes relative paths that no longer match your directory structure
Solution:
- Open the hash file and inspect filenames
- Run verification from the correct working directory