Understanding Checksums: Protecting and Verifying Your Data Accurately
In today’s digital landscape, where data is constantly transferred, copied, downloaded, and stored across countless devices and networks, ensuring your files remain intact and unaltered is essential. Whether you’re verifying a critical software download, backing up personal files, or managing long-term archives, checksums offer a reliable solution.
Checksums are simple yet powerful tools that verify data integrity, detect corruption, and identify unauthorised changes.
This guide covers what checksums are, how they function, their limitations, and how to confidently apply them in everyday workflows. From basic hash checks to secure verification using GPG signatures, you’ll gain practical knowledge to safeguard your data with clarity and precision.
Overview #
What It Is #
A checksum is a short, fixed-size string generated from a file’s contents using a mathematical algorithm (called a hash function). It acts like a digital fingerprint for the file. If even a single bit in the file changes, the checksum will (with high probability) also change.
Checksums are widely used to:
- Detect file corruption (e.g. due to disk failure or network transmission errors)
- Verify integrity after copying, downloading, or backing up
- Support reproducibility in software, research, and builds
- Compare versions of files quickly and reliably
Checksums do not provide security by themselves. They cannot:
- Prove authenticity
- Detect intentional tampering (unless combined with cryptographic signatures like GPG)
How It Works #
- A hashing algorithm processes the file and produces a checksum string.
- You can later recompute the checksum and compare it to the original.
- If the values match, the file is unchanged. If they differ, the file is corrupted or altered.
Syntax #
sha256sum [OPTION] <file>
Option:
-c
– Check a file against a previously generated SHA-256 checksum.
Example
Create and open a file:
vim file.txt
Add the following content:
Hello, world!
Generate the SHA-256 checksum and save it:
sha256sum file.txt > file.txt.sha256
Expected output:
d9014c4624844aa5bac314773d6b689ad467fa4e1d1a50a1b8a99d5a95f72ff5 file.txt
This checksum uniquely represents the content of file.txt
. Any modification to the file will result in a different checksum.
Verify the checksum:
sha256sum -c file.txt.sha256
Expected output:
file.txt: OK
Common Use Cases #
- Verifying downloaded files
When you download a file (like an ISO image or software package), you can compare its checksum to the one provided by the source. If they match, the file is complete and hasn’t been tampered with. - Validating data integrity in backups or deployments
Checksums help ensure that files stored in backups or deployed across systems haven’t changed or become corrupted over time. - Ensuring consistent results in automation and builds
In CI pipelines and reproducible builds, checksums confirm that the inputs and outputs are exactly the same every time, helping to detect unexpected changes or errors.
For trust and authenticity, pair checksums with digital signatures (e.g. using gpg
).
Core Concepts #
Key Terminology #
Checksum
A checksum is a short, fixed-length string of characters (usually hexadecimal) computed from a file's contents using a hash function. It acts like a fingerprint for that file. If even one byte changes, the checksum changes.
Hash Function
A hash function is a mathematical algorithm that takes an input (e.g. file content) and produces a fixed-size output (hash value). For checksums, it must be:
- Deterministic: same input always gives the same output
- Fast: quick to compute
- Collision-resistant: unlikely that two different inputs produce the same output
Example: SHA-256
.
Hash
The hash is the actual output of the hash function, a fixed-length string used as a checksum.
Example (SHA-256 hash):
d9014c4624844aa5bac314773d6b689ad467fa4e1d1a50a1b8a99d5a95f72ff5
Collision
A collision occurs when two different inputs produce the same hash. This undermines the trustworthiness of a hash function.
Integrity
Integrity means that the data has not been altered unintentionally or maliciously. A matching checksum confirms that the content is intact.
Authenticity
Authenticity means the data genuinely comes from the claimed source. Checksums alone cannot provide authenticity. For that, use digital signatures.
Digital Signature
A digital signature is a cryptographic mechanism that proves a checksum (or file) was created by a trusted source. It uses public-key cryptography:
- Signed with a private key
- Verified with a public key
Example: GPG-signed checksum files.
Verification
Verification is the process of checking whether a computed checksum matches a known, trusted value.
Bit Rot
Bit rot refers to the gradual corruption of data stored on disk over time. Checksums can detect bit rot by highlighting changes in the file's hash during periodic integrity checks.
Hashing Algorithms #
Algorithm | Output Size | Common Use Cases | Use This If... | Avoid If... |
---|---|---|---|---|
CRC32 | 32 bits | Networking, file system checks | You're detecting accidental corruption in safe environments | You need cryptographic integrity or tamper detection |
MD5 | 128 bits | Legacy file validation | You're working with legacy software or protocols | You require strong security or tamper resistance |
SHA-1 | 160 bits | Deprecated security, Git internals | You're interacting with Git internals (no alternative) | You need reliable integrity in hostile environments |
SHA-256 | 256 bits | Modern integrity and security checks | You need secure, collision-resistant verification | You're in a rare performance-sensitive environment |
SHA-512 | 512 bits | High-assurance integrity verification | You want stronger guarantees for small, critical files | File sizes are large and performance is a concern |
Recommendations
Use SHA-256
Recommended in most cases.
- Strong collision resistance
- Widely supported in tools (
sha256sum
, GPG, TLS, etc.) - Efficient for modern CPUs
Use SHA-512 for high-security requirements
- Stronger against brute-force attacks
- Suitable for archival, backups, or cryptographic applications
Avoid MD5 and SHA-1 for any new security-sensitive work
- Both have known collision attacks
- Still useful for detecting accidental corruption in non-hostile environments
Use CRC32 only for error detection
- Designed for quick detection of transmission/storage errors
- Not suitable for verifying trust, authenticity, or against adversarial modifications
Limitations of Checksums #
Checksums are highly effective for detecting unintended file changes, but they have important limitations. Understanding these limitations is critical when deciding how and when to use them, especially in contexts where security or authenticity is important.
Non-Cryptographic Safety
Basic checksum algorithms like CRC32, and MD5 were designed for error detection, not security. They can be forged or manipulated with minimal effort.
- Example: Two different files can be crafted to produce the same MD5 checksum (a collision).
- Impact: An attacker can tamper with a file and make the checksum appear valid.
No Protection from Intentional Tampering
Checksums alone do not verify authenticity. Anyone can generate a matching checksum for a malicious or modified file.
- Solution: Use digital signatures (e.g. GPG) to verify both the checksum file and its source.
- Secure Workflow: Only trust checksums if they are cryptographically signed by a known and trusted key.
Vulnerable to Collisions
A collision occurs when two different inputs generate the same checksum. This is a known weakness in older algorithms like MD5 and SHA-1.
- MD5 and SHA-1 are no longer secure for verification of sensitive files.
- Use SHA-256 or higher for modern integrity checking.
No Built-in Authentication or Encryption
Checksums do not:
- Confirm the identity of the file's author or source
- Prevent an attacker from replacing both a file and its checksum
- Encrypt or conceal the file contents
Checksums = Integrity only.
For authenticity and trust, combine with digital signatures.
Filesystem and Path Dependencies
Checksum verification may fail if:
- Files are renamed or moved (unless relative paths are used carefully)
- Line endings change (especially on cross-platform backups)
- Metadata (timestamps, permissions) is changed in a way that affects certain tools
Avoid including relative or variable paths that may cause false mismatches during verification. Use consistent, plain filenames when generating checksum lists.
Summary
Limitation | Description | Mitigation |
---|---|---|
Not secure on their own | Vulnerable to forgery and spoofing | Use GPG signatures |
Collisions possible | MD5 and SHA-1 can be tricked | Use SHA-256 or SHA-512 |
No authentication | Cannot verify file origin | Use signed checksum files |
No encryption | File contents are not hidden or protected | Combine with other tools as needed |
Path/format dependencies | Changes in names or formats may break checks | Use consistent, scripted workflows |
Always use checksums as part of a broader integrity and security process, not as a standalone solution for trust or authenticity.
Workflow #
Install #
Ubuntu Linux: sha256sum
, md5sum
, and gpg
are included by default as part of the coreutils and gnupg packages. No installation is needed.
macOS (using Homebrew):
brew install coreutils
brew install gnupg
Verify installations
Check GPG keys:
gpg --list-keys
Check versions of checksum tools:
md5sum --version
sha256sum --version
Basic Commands #
1. Generate SHA-256 Checksum
Single file (display in terminal):
sha256sum <file>
Expected output:
d9014c4624844aa5bac314773d6b689ad467fa4e1d1a50a1b8a99d5a95f72ff5 <file>
Single file (save to checksum file):
sha256sum <file> > <file>.sha256
This creates a .sha256
file with the expected format for verification.
Multiple files (save to combined checksum file):
sha256sum <file1> <file2> > SHA256SUMS
Expected output:
6dcd4ce23d88e2ee9568ba546c007c63e231f530f115b4a846de495d3806971f file1.txt
2c6ee24b09816a6f14f95d1698b24cfc3f438178e6a7c1f1cc55cfe547b0208a file2.txt
2. Verify SHA-256 Checksum
Single file:
sha256sum -c <file>.sha256
Output:
<file>: OK
If the file has been altered or is missing:
<file>: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
Multiple files:
sha256sum -c SHA256SUMS
Each file will be checked and marked OK
or FAILED
.
Suppress ‘OK’ Messages:
To show only problems:
sha256sum -c --quiet SHA256SUMS
If file paths or filenames have changed since the checksums were generated, verification will fail. Either move files back to the correct path or regenerate the checksum file.
3. Folder-Wide Checksum Operations (Safe, Recursive)
Generate checksums for all files (exclude unwanted files):
cd </path/to/files>
find . -type f \
! -path './.git/*' \
! -name '*.sha256' \
! -name '.DS_Store' \
-print0 | sort -z | xargs -0 sha256sum > SHA256SUMS
Explanation:
find . -type f
: finds all regular files! -path
,! -name
: excludes.git
,.sha256
, and.DS_Store
-print0
andxargs -0
: safely handle filenames with spaces or special characterssort -z
: ensures a consistent, reproducible file order
Verify all files in folder:
sha256sum -c SHA256SUMS
4. Sign and Verify Checksum File (GPG)
Goal: Confirm that the checksum file is authentic and untampered.
Signing the checksum file with GPG:
Authenticates the source (proves it came from you or a trusted party).
Protects against tampering (ensures the checksum hasn’t been altered).
This adds a layer of trust before verifying the actual file’s integrity.
1. Sign the Checksum File (You)
Use your private GPG key to sign the checksum file:
gpg --armor --detach-sign <file>.sha256
You will see a prompt:
Please enter the passphrase to unlock the OpenPGP secret key:
Enter your GPG passphrase when prompted.
This creates an ASCII-armoured detached signature file:
<file>.sha256.asc
2. Share the Following with the Recipient
- The file:
<file>
- The checksum file:
<file>.sha256
- The signature file:
<file>.sha256.asc
- Your public GPG key
- Your fingerprint
Export public key to a file:
gpg --armor --export <[email protected]> > publickey.asc
--armor
: Outputs in readable ASCII format (suitable for sharing)[email protected]
: Replace with your GPG key's user ID or emailpublickey.asc
: This file contains your public key
Before sharing your public key, verify your key’s unique fingerprint:
gpg --fingerprint <[email protected]>
This shows a string of characters (the fingerprint) like:
ABCD 1234 EFGH 5678 IJKL 9012 MNOP 3456 QRST 7890 UVWX
Share:
- Public key file (
publickey.asc
):- Attach to an email
- Upload to your website or Git repository
- Share via secure file transfer
- Fingerprint:
- Share separately (e.g., in email text, website, or README)
- Allows recipients to verify they have the correct key
- Protects against fake keys and impersonation
- Builds trust and confirms your identity
Sharing both the public key and its fingerprint helps others verify your signatures securely.
3. Verify the Signature (Recipient)
The recipient needs your public GPG key to verify the signature:
gpg --verify <file>.sha256.asc <file>.sha256
Expected output:
gpg: Good signature from "Your Name <[email protected]>"
- Confirms the checksum file was signed by your private key
- Validates the checksum file has not been altered
4. Verify the File Integrity (Recipient)
Once the checksum file is trusted, the recipient can verify the actual file:
sha256sum -c <file>.sha256
Expected output:
<file>: OK
Verifying Downloaded ISO Files #
1. Download Checksum Files
Visit https://releases.ubuntu.com/ and download the following:
SHA256SUMS
– contains SHA-256 hashes for the available imagesSHA256SUMS.gpg
– GPG signature used to verify the checksum file
2. Confirm the Checksum File Is Authentic
You want to make sure that the file listing all the checksums (SHA256SUMS
) hasn’t been tampered with. To do that, you use the corresponding signature file (SHA256SUMS.gpg
), which proves it was created by a trusted Ubuntu source.
The SHA256SUMS.gpg
file is like a digital "seal" applied by Ubuntu. When you verify this signature, you are checking that:
The checksum list is genuine and hasn’t been altered.
It was signed by the official Ubuntu release key.
This step ensures you're checking your ISO against a trusted and untampered checksum.
Run:
gpg --keyid-format long --verify SHA256SUMS.gpg SHA256SUMS
If you see an error like:
gpg: Can't check signature: No public key
... it means your system does not have the public key needed to verify the signature.
The output will also tell you which key is missing. For example:
using RSA key 843938DF228D22F7B3742BC0D94AA3F0EFE21092
Import the missing key from Ubuntu’s key server:
gpg --keyid-format long --keyserver hkp://keyserver.ubuntu.com --recv-keys <key_id>
Expected output:
gpg: key D94AA3F0EFE21092: public key "Ubuntu CD Image Automatic Signing Key (2012) <[email protected]>" imported
gpg: Total number processed: 1
gpg: imported: 1
Verify the key is now in your keyring:
gpg --list-keys --keyid-format long
Re-run the verification:
gpg --keyid-format long --verify SHA256SUMS.gpg SHA256SUMS
Expected output:
gpg: Good signature from "Ubuntu CD Image Automatic Signing Key (2012) <[email protected]>" [unknown]
This confirms that the checksum file you downloaded is authentic and has not been tampered with.
The warning about the key not being certified means you haven’t personally marked this key as trusted, but the signature is valid.
3. Verify the ISO Checksum
Ensure the ISO file and both SHA256SUMS
files are in the same directory
Run:
sha256sum -c SHA256SUMS 2>&1 | grep OK
Explanation:
sha256sum -c SHA256SUMS
verifies files against the checksums listed inSHA256SUMS
2>&1
merges error messages with normal output| grep OK
filters to show only files that passed verification
Expected output:
ubuntu-24.04-desktop-amd64.iso: OK
If the output does not include : OK
, the ISO may be corrupted or tampered with. In that case, re-download it from a trusted source.
4. Clean Up
Remove the imported key:
gpg --delete-keys <key_id>
If prompted, type y
to confirm key deletion from the keyring.
Confirm the key has been removed:
gpg --list-keys --keyid-format long
Best Practices #
- Use Strong, Modern Hash Algorithms
Prefer SHA-256 or SHA-512 for generating checksums. MD5 and SHA-1 are outdated and vulnerable to collisions, so only use them for legacy or very low-risk checks. - Always Verify Downloads
Before using any downloaded file or installer, verify its checksum if provided. Only trust checksums that come from secure sources, ideally those signed with a GPG key. - Automate Checksum Verification
Integrate checksum checks into your installation, deployment scripts, or CI/CD pipelines to catch corruption or tampering early in the process. - Store Checksums Alongside Files
Keep checksum files together with backups, archives, or datasets to ensure you can verify integrity whenever needed. - Use Checksums for Integrity, Not Authenticity
Checksums detect accidental changes but do not confirm the file’s origin. For authenticity, combine checksums with digital signatures such as GPG. - Regularly Re-Verify Long-Term Data
Schedule periodic integrity checks on archival or critical data to detect silent corruption or bit rot before it causes data loss. - Be Careful with File Paths in Checksum Files
Avoid including relative or variable paths that may cause false mismatches during verification. Use consistent, plain filenames when generating checksum lists. - Protect Checksum Files
Store checksum files in read-only or access-controlled locations to prevent tampering. Consider signing checksum files with GPG to ensure their authenticity. - Use Standard Naming Conventions
Adopt clear and consistent names likeSHA256SUMS
or<file>.sha256
so users and scripts can easily find and use the verification data. - Educate Your Team
Make checksum generation and verification a standard part of workflows before publishing, deploying, or archiving files. Clear procedures reduce errors and improve security.
By following these best practices, you strengthen your data integrity processes and reduce the risk of unnoticed corruption or unauthorised modification.
Frequently Asked Questions (FAQ) #
What is the difference between a checksum and a digital signature?
A checksum verifies data integrity by confirming that the data has not changed or become corrupted. It does not prove who created the data or whether it is authentic.
A digital signature ensures both authenticity and integrity. It uses a private key to sign the data and a public key to verify it, confirming the file’s origin. For secure distribution, use digital signatures alongside checksums.
Should I use checksums for my backups?
Yes. Checksums are a best practice for backups. They ensure files remain intact, uncorrupted, and untampered over time.
Benefits include:
- Detecting silent corruption like bit rot or storage errors
- Verifying complete and correct file copies
- Confirming long-term integrity through periodic checks
- Providing confidence in disaster recovery with exact file matches
Why is MD5 still used if it is broken?
MD5 is fast and easy to implement. It’s acceptable for low-risk, non-security applications like basic integrity checks. However, avoid MD5 for anything requiring trust or authenticity because it is vulnerable to collisions.
How large are checksum files?
Checksum files are very small, typically only a few kilobytes, even for many files. Each entry contains a hash value and filename, making them efficient to store and share.
Can I trust checksum files I find online?
Only if they:
- Come from a trusted source
- Are delivered over HTTPS
- Are signed with GPG and the signature verifies
If any of these are missing, the checksum file may be compromised or fake.
Are checksums useful for small files?
Yes. Even small files can suffer corruption or tampering. Checksums are quick to compute and add minimal overhead, so always verify files regardless of size.
Troubleshooting #
Verification Failures #
Symptom:
filename: FAILED
sha256sum: WARNING: 1 computed checksum did NOT match
Possible Causes:
- File was altered, corrupted, or truncated during download or transfer
- Wrong version of the file
- Line endings changed (especially on cross-platform transfers)
- The checksum file is outdated or not intended for this file
Solutions:
- Re-download the file and the checksum
- Verify file size and date
- Use
diff
to examine differences - Use a tool that preserves binary integrity during transfer (e.g.
rsync
)
Checksum File Format Errors #
Symptom:
sha256sum: filename.sha256: no properly formatted checksum lines found
Possible Causes:
- Checksum file contains invalid formatting or extra characters
- Copy-pasted from a website without cleaning up
- Incorrect hash type (e.g. using
sha256sum
on an MD5 file)
Solutions:
- Ensure the file follows standard format:
<hash> <file>
- Remove extra headers, blank lines, or markdown formatting
- Match the hash algorithm with the tool used (e.g.
md5sum
vssha256sum
)
Path Mismatches #
Symptom:
- Checksum verification fails with "No such file" or "FAILED open or read"
Cause:
- Checksum file includes relative paths that no longer match your directory structure
Solution:
- Open the checksum file and inspect filenames
- Run verification from the correct working directory