Hashing: Verify Data Integrity, Detect Corruption, and Identify Tampering

Data is constantly transferred, copied, downloaded, and stored across countless devices and networks. Ensuring your files remain intact and unaltered is essential. Whether verifying a critical software download, backing up personal files, or managing long-term archives, hashing provides a reliable solution.

Hashing is a simple yet powerful technique used to verify data integrity, detect corruption, and identify unauthorised changes.

This guide explains what hashing is, how it works, its limitations, and how to apply it confidently in everyday workflows. From basic hash checks to secure verification using GPG signatures, you’ll gain practical knowledge to safeguard your data with clarity and precision.

Overview #

What It Is #

A hash is a short, fixed-size string generated from a file’s contents using a mathematical algorithm called a hash function. It acts like a digital fingerprint for the file. If even a single bit in the file changes, the hash will (with high probability) also change.

Hashing is widely used to:

  • Detect file corruption (e.g. due to disk failure or network transmission errors)
  • Verify integrity after copying, downloading, or backing up
  • Support reproducibility in software, research, and builds
  • Compare versions of files quickly and reliably

Hashes do not provide security by themselves. They cannot:

  • Prove authenticity
  • Detect intentional tampering (unless combined with cryptographic signatures like GPG)

How It Works #

  • A hashing algorithm processes the file and produces a hash value.
  • You can later recompute the hash and compare it to the original.
  • If the values match, the file is unchanged. If they differ, the file is corrupted or altered.

Syntax #

bash
sha256sum [OPTION] <file>

Option:

  • -c - Check a file against a previously generated SHA-256 hash.

Example

Create and open a file:

bash
vim file.txt

Add the following content:

~/file.txt
Hello, world!

Generate the SHA-256 hash and save it:

bash
sha256sum file.txt > file.txt.sha256

Expected output:

file.txt.sha256
d9014c4624844aa5bac314773d6b689ad467fa4e1d1a50a1b8a99d5a95f72ff5  file.txt

This hash uniquely represents the content of file.txt. Any modification to the file will result in a different hash.

Verify the hash:

bash
sha256sum -c file.txt.sha256

Expected output:

text
file.txt: OK

Common Use Cases #

  • Verifying downloaded files
    When you download a file (such as an ISO image or software package), you can compare its hash to the one provided by the source. If they match, the file is complete and hasn’t been tampered with.
  • Validating data integrity in backups or deployments
    Hashing ensures that files stored in backups or deployed across systems haven’t changed or become corrupted over time.
  • Ensuring consistent results in automation and builds
    In CI pipelines and reproducible builds, hashes confirm that the inputs and outputs are exactly the same every time, helping to detect unexpected changes or errors.

For trust and authenticity, pair hashing with digital signatures (e.g. using gpg).

Core Concepts #

Key Terminology #

Authenticity

Authenticity confirms that data originates from a legitimate and trusted source. Hashing alone does not guarantee authenticity. To achieve this, hashes must be combined with digital signatures or other cryptographic methods.

Bit Rot

Bit rot refers to the gradual and unintentional corruption of data stored on disk or other media over time. Periodic hash verification allows early detection of bit rot by revealing any unexpected changes in file contents.

Collision

A collision occurs when two different inputs produce the same hash value. This undermines the trustworthiness of the hash function and can be exploited in certain attack scenarios. Collision resistance is essential for secure applications.

Digital Signature

A digital signature is a cryptographic mechanism that proves a hash (or file) was generated or approved by a trusted source. It relies on public-key cryptography:

  • Signed using a private key

  • Verified using a public key

Example: GPG-signed hash files

Hash

A hash is a short, fixed-length string of characters (usually hexadecimal) computed from a file’s contents using a hash function. It acts like a digital fingerprint for that file. If even one byte changes, the hash will (with high probability) also change.

Hash Function

A hash function is a mathematical algorithm that takes an input (e.g. file content) and produces a fixed-size output (a hash). For integrity checking, a good hash function should be:

  • Deterministic: The same input always produces the same output

  • Fast: Efficient to compute

  • Collision-resistant: It is computationally infeasible for two different inputs to produce the same output

Example: SHA-256

Hash Value

The hash value is the actual output of the hash function – a fixed-length string used to verify the integrity of the input data.

Example (SHA-256 hash):

text
d9014c4624844aa5bac314773d6b689ad467fa4e1d1a50a1b8a99d5a95f72ff5

Integrity

Integrity means the data has not been altered either unintentionally (e.g. bit rot) or maliciously. A matching hash confirms that the content is unchanged since the hash was last computed.

Verification

Verification is the process of checking whether a newly computed hash matches a previously known and trusted hash. If they match, the file is considered unaltered.

Let me know if you'd like these formatted as a glossary table or cross-referenced.

Hashing Algorithms #

Algorithm Output Size Common Use Cases Use This If... Avoid If...
CRC32 32 bits Networking, file system checks You are detecting accidental corruption in controlled environments You need cryptographic integrity or tamper detection
MD5 128 bits Legacy file validation You are working with legacy systems or protocols You require strong security or resistance to tampering
SHA-1 160 bits Deprecated security, Git internals You are interfacing with Git or old APIs with no alternatives You need reliable integrity in adversarial conditions
SHA-256 256 bits Modern integrity and security checks You need secure, collision-resistant verification You are in a rare performance-constrained scenario
SHA-512 512 bits High-assurance integrity verification You want strong guarantees for small, high-value files File sizes are large and performance is a concern

Recommendations

Use SHA-256

Recommended in most cases.

SHA-256 is suitable for most general-purpose and security-sensitive scenarios.

  • Strong collision resistance

  • Widely supported across platforms and tools (sha256sum, gpg, TLS, etc.)

  • Efficient on modern 64-bit CPUs

Use SHA-512 for high-security applications

SHA-512 offers higher assurance and longer hash outputs, useful for sensitive or archival data.

  • Greater resistance to brute-force attacks

  • Suitable for long-term storage, cryptographic systems, and critical integrity checks

Avoid MD5 and SHA-1 for new security applications

  • Both have practical collision attacks and are no longer considered secure

  • Still usable for non-hostile environments where accidental corruption is the only concern

Use CRC32 only for basic error detection

  • Fast and lightweight

  • Suitable for file systems, network protocols, and checksums in safe environments

  • Not suitable for verifying authenticity or resisting tampering

Limitations of Hashing #

Hashing is highly effective for detecting unintended file changes, but it has critical limitations. Understanding these is essential when using hashing in contexts involving security, authenticity, or untrusted environments.

Non-Cryptographic Safety

Basic algorithms like CRC32 and MD5 were designed for error detection or legacy compatibility, not for secure verification. They are vulnerable to intentional manipulation.

  • Example: Two different files can be crafted to produce the same MD5 hash (a collision).
  • Impact: An attacker can modify a file and still make the hash appear valid.

No Protection from Intentional Tampering

Hash values only reflect content. They do not confirm who created the file or whether it has been maliciously replaced.

  • Solution: Use digital signatures (e.g. gpg) to validate both the hash file and its origin.
  • Secure workflow: Only trust hashes if they are cryptographically signed by a known and verified key.

Vulnerable to Collisions

Hash collisions occur when two different inputs generate the same output. This undermines the reliability of older algorithms.

  • MD5 and SHA-1 are no longer secure for cryptographic verification.
  • Recommendation: Use SHA-256 or stronger algorithms for modern workflows.

No Built-in Authentication or Encryption

Hashing does not:

  • Authenticate the identity of the file's creator
  • Prevent an attacker from replacing both the file and its associated hash
  • Encrypt or conceal file contents
  • Hashes = Integrity only.

  • To achieve authenticity and trust, use digital signatures in combination with hashes.

Filesystem and Path Dependencies

Hash verification may fail under certain conditions:

  • Files are renamed or moved (depending on how paths are recorded)
  • Line endings change across platforms (e.g. Windows vs Linux)
  • File metadata (e.g. timestamps, permissions) affects tools that hash more than just file contents

Use simple, consistent filenames and scripting practices. Avoid embedding variable paths in hash lists unless necessary.

Summary

Limitation Description Mitigation
Not secure on their own Vulnerable to forgery and spoofing Use GPG-signed hashes
Collisions possible MD5 and SHA-1 can be tricked Use SHA-256 or SHA-512
No authentication Cannot confirm file origin Use digital signatures
No encryption Does not protect or hide data Combine with encryption tools
Path/format sensitivity File or format changes can break checks Use stable, scripted workflows

Always use hashing as part of a broader integrity and security process, not as a standalone solution for authenticity or trust.

Workflow #

Install #

Ubuntu Linux: sha256sum, md5sum, and gpg are included by default as part of the coreutils and gnupg packages. No installation is needed.

macOS (using Homebrew):

bash
brew install coreutils
brew install gnupg

Verify installations

Check GPG keys:

bash
gpg --list-keys

Check versions:

bash
md5sum --version
sha256sum --version

Basic Commands #

1. Generate SHA-256 Hash

Single file (display in terminal):

bash
sha256sum <file>

Expected output:

bash
d9014c4624844aa5bac314773d6b689ad467fa4e1d1a50a1b8a99d5a95f72ff5  <file>

Single file (save to hash file):

bash
sha256sum <file> > <file>.sha256

This creates a .sha256 file containing the hash and filename.

Multiple files (save to combined hash file):

bash
sha256sum <file1> <file2> > SHA256SUMS

Expected output:

SHA256SUMS
6dcd4ce23d88e2ee9568ba546c007c63e231f530f115b4a846de495d3806971f  file1.txt
2c6ee24b09816a6f14f95d1698b24cfc3f438178e6a7c1f1cc55cfe547b0208a  file2.txt

2. Verify SHA-256 Hash

Single file:

bash
sha256sum -c <file>.sha256

Output:

bash
<file>: OK

If the file has been altered or is missing:

bash
<file>: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match

Multiple files:

bash
sha256sum -c SHA256SUMS

Each file will be checked and marked OK or FAILED.

Suppress ‘OK’ Messages:

To show only problems:

bash
sha256sum -c --quiet SHA256SUMS

If file paths or names have changed since hashes were generated, verification will fail. Either move the files back or regenerate the hash list.

3. Folder-Wide Hashing (Recursive and Safe)

Generate SHA-256 hashes for all files, excluding unwanted ones:

bash
cd </path/to/files>

find . -type f \
  ! -path './.git/*' \
  ! -name '*.sha256' \
  ! -name '.DS_Store' \
  -print0 | sort -z | xargs -0 sha256sum > SHA256SUMS

Explanation:

  • find . -type f: finds all regular files
  • ! -path, ! -name: excludes .git, .sha256, and .DS_Store
  • -print0 and xargs -0: safely handle filenames with spaces or special characters
  • sort -z: ensures a consistent, reproducible file order

Verify all files in folder:

bash
sha256sum -c SHA256SUMS

4. Sign and Verify Hash File with GPG

Digitally signing the hash file provides:

  • Authenticity: Confirms the file came from a trusted party.

  • Tamper protection: Ensures the hash file itself hasn't been altered.

Step 1: Sign the Hash File (Sender)

bash
gpg --armor --detach-sign <file>.sha256

You will be prompted for your GPG passphrase.

This creates a detached ASCII-armoured signature:

text
<file>.sha256.asc

Step 2: Share with the Recipient

  • The original file: <file>
  • The hash file: <file>.sha256
  • The signature file: <file>.sha256.asc
  • Your public GPG key
  • Your GPG fingerprint

Export public key to a file:

bash
gpg --armor --export <[email protected]> > publickey.asc
  • --armor: Outputs in readable ASCII format (suitable for sharing)
  • [email protected]: Replace with your GPG key's user ID or email
  • publickey.asc: This file contains your public key

Before sharing your public key, verify your key’s unique fingerprint:

bash
gpg --fingerprint <[email protected]>

This shows a string of characters (the fingerprint) like:

text
ABCD 1234 EFGH 5678 IJKL 9012 MNOP 3456 QRST 7890 UVWX

Share securely

  • Public key (publickey.asc): attach to email, upload to a secure server, or host on GitHub
  • Fingerprint: send separately (e.g. in plain text or README) to allow verification

Step 3: Verify the Signature (Recipient)

The recipient needs your public GPG key to verify the signature:

bash
gpg --verify <file>.sha256.asc <file>.sha256

Expected output:

text
gpg: Good signature from "Your Name <[email protected]>"
  • Confirms the hash file is authentic
  • Verifies it has not been tampered with

Step 4: Verify the File Integrity (Recipient)

Once the hash file is trusted, the recipient can verify the actual file:

bash
sha256sum -c <file>.sha256

Expected output:

text
<file>: OK

Verifying Downloaded ISO Files #

1. Download Hash and Signature Files

From the Ubuntu releases site:

  • SHA256SUMS – contains the SHA-256 hashes for available ISO images
  • SHA256SUMS.gpg – a GPG signature for verifying the integrity and authenticity of SHA256SUMS

Download both from:

text
https://releases.ubuntu.com/

2. Verify the Authenticity of the Hash File

To ensure the SHA256SUMS file itself has not been tampered with:

bash
gpg --keyid-format long --verify SHA256SUMS.gpg SHA256SUMS

If you receive the following error:

text
gpg: Can't check signature: No public key

It means the necessary Ubuntu signing key is not in your keyring.

Check the output for the required key ID. Example:

text
using RSA key 843938DF228D22F7B3742BC0D94AA3F0EFE21092

Import the missing key from Ubuntu’s key server:

bash
gpg --keyid-format long --keyserver hkp://keyserver.ubuntu.com --recv-keys <key_id>

Expected output:

text
gpg: key D94AA3F0EFE21092: public key "Ubuntu CD Image Automatic Signing Key (2012) <[email protected]>" imported
gpg: Total number processed: 1
gpg:               imported: 1

Verify the key is now in your keyring:

bash
gpg --list-keys --keyid-format long

Re-run the verification:

bash
gpg --keyid-format long --verify SHA256SUMS.gpg SHA256SUMS

Expected output:

text
gpg: Good signature from "Ubuntu CD Image Automatic Signing Key (2012) <[email protected]>" [unknown]

This confirms that the hash file you downloaded is authentic and has not been tampered with.

The warning about the key not being certified means you haven’t personally marked this key as trusted, but the signature is valid.

3. Verify the ISO Hash

Ensure the ISO file and both SHA256SUMS files are in the same directory

Run:

bash
sha256sum -c SHA256SUMS 2>&1 | grep OK

Explanation:

  • sha256sum -c SHA256SUMS: checks all files listed in the hash file
  • 2>&1: merges stderr with stdout
  • | grep OK: filters to only show successful matches

Expected output:

text
ubuntu-24.04-desktop-amd64.iso: OK

If the output does not include : OK, the ISO may be corrupted or tampered with. In that case, re-download it from a trusted source.

4. Clean Up

Remove the imported key:

bash
gpg --delete-keys <key_id>

If prompted, type y to confirm key deletion from the keyring.

Confirm the key has been removed:

bash
gpg --list-keys --keyid-format long

Best Practices #

  • Use Strong, Modern Hash Algorithms
    Prefer SHA-256 or SHA-512 for generating hashes. Avoid MD5 and SHA-1 except for legacy or very low-risk cases due to their vulnerability to collisions.
  • Always Verify Downloads
    Verify the hash of any downloaded file or installer before use. Trust only hashes from secure, ideally GPG-signed, sources.
  • Automate Hash Verification
    Integrate hash verification into installation scripts, deployment processes, or CI/CD pipelines to detect corruption or tampering early.
  • Store Hashes Alongside Files
    Keep hash files together with backups, archives, or datasets to enable verification at any time.
  • Use Hashes for Integrity, Not Authenticity
    Hashes detect accidental changes but do not confirm file origin. Combine hashes with digital signatures (e.g. GPG) to verify authenticity.
  • Regularly Re-Verify Long-Term Data
    Schedule periodic integrity checks on archival or critical data to detect silent corruption or bit rot before data loss occurs.
  • Be Careful with File Paths in Hash Files
    Avoid relative or variable paths in hash lists to prevent false mismatches. Use consistent, simple filenames when generating hash files.
  • Protect Hash Files
    Store hash files in read-only or access-controlled locations to prevent tampering. Sign hash files with GPG to ensure authenticity.
  • Use Standard Naming Conventions
    Adopt clear, consistent names such as SHA256SUMS or <file>.sha256 for easy identification by users and scripts.
  • Educate Your Team
    Make hash generation and verification a standard part of workflows before publishing, deploying, or archiving files. Clear procedures reduce errors and improve security.

By adhering to these best practices, you enhance your data integrity processes and reduce the risk of unnoticed corruption or unauthorised modification.

By adhering to these best practices, you enhance your data integrity processes and reduce the risk of unnoticed corruption or unauthorised modification.

Frequently Asked Questions (FAQ) #

What is the difference between a hash and a digital signature?

A hash verifies data integrity by confirming that the data has not changed or become corrupted. It does not prove who created the data or whether it is authentic.

A digital signature ensures both authenticity and integrity. It uses a private key to sign the data and a public key to verify it, confirming the file’s origin. For secure distribution, use digital signatures alongside hashes.

Should I use hashes for my backups?

Yes. Hashes are a best practice for backups. They ensure files remain intact, uncorrupted, and untampered over time.

Benefits include:

  • Detecting silent corruption like bit rot or storage errors
  • Verifying complete and correct file copies
  • Confirming long-term integrity through periodic checks
  • Providing confidence in disaster recovery with exact file matches

Why is MD5 still used if it is broken?

MD5 is fast and easy to implement. It’s acceptable for low-risk, non-security applications like basic integrity checks. However, avoid MD5 for anything requiring trust or authenticity because it is vulnerable to collisions.

How large are hash files?

Hash files are very small, typically only a few kilobytes, even for many files. Each entry contains a hash value and filename, making them efficient to store and share.

Can I trust hash files I find online?

Only if they:

  • Come from a trusted source
  • Are delivered over HTTPS
  • Are signed with GPG and the signature verifies

If any of these are missing, the hash file may be compromised or fake.

Are hashes useful for small files?

Yes. Even small files can suffer corruption or tampering. Hashes are quick to compute and add minimal overhead, so always verify files regardless of size.

Troubleshooting #

Verification Failures #

Symptom:

bash
filename: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match

Possible Causes:

  • File was altered, corrupted, or truncated during download or transfer
  • Wrong version of the file
  • Line endings changed (especially on cross-platform transfers)
  • The hash file is outdated or not intended for this file

Solutions:

  • Re-download the file and the hash file
  • Verify file size and date
  • Use diff to examine differences
  • Use a tool that preserves binary integrity during transfer (e.g. rsync)

Hash File Format Errors #

Symptom:

bash
sha256sum: filename.sha256: no properly formatted checksum lines found

Possible Causes:

  • Hash file contains invalid formatting or extra characters
  • Copy-pasted from a website without cleaning up
  • Incorrect hash type (e.g. using sha256sum on an MD5 hash file)

Solutions:

  • Ensure the file follows standard format:
bash
<hash> <file>
  • Remove extra headers, blank lines, or markdown formatting
  • Match the hash algorithm with the tool used (e.g. md5sum vs sha256sum)

Path Mismatches #

Symptom:

  • Checksum verification fails with "No such file" or "FAILED open or read"

Cause:

  • Hash file includes relative paths that no longer match your directory structure

Solution:

  • Open the hash file and inspect filenames
  • Run verification from the correct working directory