Alphanumeric Character: Definition, Crypto Use, Security
Sixty-two. That is the total number of distinct symbols available when someone says "use only alphanumeric characters." Twenty-six uppercase, twenty-six lowercase, ten digits. Strip case sensitivity and the count drops to thirty-six. Both numbers matter, because nearly every digital identifier you have ever typed — a wallet address, a Wi-Fi password, a transaction hash, a GitHub token — is built from a chosen subset of those sixty-two symbols.
The choice of which subset, and why, is the part most explainers skip. It is also the part that ties the encyclopedic definition of an alphanumeric character to the practical security of your crypto wallet. This article walks through the count, the standard behind it, the way modern cryptography picks subsets of it, and what the US national authority on passwords now says about it. Some of that guidance changed in ways most security advice has not caught up to yet.
What Is an Alphanumeric Character? Definition and Count
Start with the definition. An alphanumeric character is any letter A–Z (either case) or any decimal digit 0–9. That is all there is to it. From there the arithmetic is obvious — twenty-six letters per case plus ten digits gives sixty-two distinct characters with case sensitivity, thirty-six without. Anything else is "special." Punctuation, whitespace, math symbols, accented letters, emoji — none of those count as alphanumeric.
Engineers usually meet the definition through one of two shorthands. POSIX tools like grep, sed, and awk recognise `[:alnum:]`, which matches the sixty-two characters above. Most modern regex flavours — Python, JavaScript, Java, PCRE — use `\w` instead. The catch with `\w` is that it sneaks the underscore in. Underscore is not formally alphanumeric. Most programming syntax treats it as honorary, which is why the Stripe key prefix `sk_live_` and the AWS key prefix `AKIA` mix underscores and uppercase letters with digits without anyone blinking.
Where did the term come from? Punch cards, of all places. IBM tabulating equipment in the 1930s needed a single word for codes that mixed letters with numbers, and "alphanumeric" stuck. By the IBM 1401 in the early 1960s the word was standard vocabulary inside business computing. The distinction had teeth in practice — a field declared "alphanumeric" accepted any letter or digit; a numeric-only field rejected the alphabet outright. From there the word escaped into license plates, IBAN bank codes, telephone-keypad mnemonics, product SKUs, and a hundred other places.
The case-sensitive versus case-insensitive distinction matters more than it looks. Password entropy doubles when uppercase letters are allowed. Bitcoin's Base58 addresses keep both cases on purpose. Bech32 throws case away on purpose. Every one of those choices is a trade between expressiveness and human error. Pick wrong and people lose money on typos.

From ASCII to Unicode: A Brief Technical History
"Alphanumeric" today is the survivor of a standards war that lasted sixty years. Most users were caught somewhere in the middle and never noticed.
ASCII came first. It shipped in 1963, courtesy of the American Standards Association, and was formalised five years later as ANSI X3.4-1968. Two revisions followed — one in 1977, another in 1986. The 1968 version pegged uppercase A at the number 65, lowercase a at 97, and the digits 0–9 at 48 through 57. Open your editor right now: the byte 'A' is still 65. Nothing budged in sixty years.
For roughly four decades the ASCII alphanumeric set was the alphanumeric set. Then came the global web. Seven bits stopped being enough. The failure modes were ugly. Garbled emails. Broken databases. Japanese websites that worked perfectly in Tokyo and looked like walls of question marks on a US laptop. Unicode arrived in 1991 with an extravagant ambition: assign a unique number to every character in every script anyone had ever written down. UTF-8 followed in 1992 as the encoding that actually carried Unicode through ordinary networks. Its trick was backward-compatibility — UTF-8's first 128 code points are the original 1968 ASCII bytes, exactly. English text shipped before 1991 kept working forever.
The crossover happened in December 2007. That month, public web-crawl statistics finally showed UTF-8 overtaking ASCII as the most common encoding online. From then on "alphanumeric" stopped strictly meaning the sixty-two ASCII symbols. Unicode now catalogues alphanumeric blocks for Cyrillic, Greek, Arabic, Hebrew, and the CJK scripts. Each script comes with its own letters and its own digits.
In practice, though, software that has to cross borders still defaults to the original ASCII subset. Latin letters A–Z. Arabic numerals 0–9. Nothing else. The reason is mundane. Every keyboard produces ASCII. Every database accepts it. Every regex engine knows it. Step outside the range and you inherit a portfolio of misencoding bugs, look-alike characters, and the phishing attacks I will come back to below.
| Class | Members | Count | Regex shorthand | Example | |
|---|---|---|---|---|---|
| Alphabetic | A–Z, a–z | 52 | `[A-Za-z]` | Plain English word | |
| Numeric | 0–9 | 10 | `[0-9]` or `\d` | A year, a postcode | |
| Alphanumeric | A–Z, a–z, 0–9 | 62 | `[A-Za-z0-9]` or `[:alnum:]` | API key, SKU | |
| Special / symbolic | `!@#$%^&*()_+-=[]{};:'"\ | ,.<>/?` | ~33 (ASCII) | `[^A-Za-z0-9]` | Password modifier |
Alphanumeric Characters in Crypto: Addresses, Hashes, Seeds
Here is the bit most generic explainers skip. Cryptocurrency systems never use the full sixty-two-character alphanumeric set. They pick carefully chosen subsets. Each one is a documented engineering compromise, not an arbitrary aesthetic.
Bitcoin first. A legacy Bitcoin address (one that begins with 1 or 3) is encoded in Base58. The alphabet was designed by Satoshi Nakamoto by hand. Recipe: take the sixty-two-character alphanumeric set, delete four members. Out go the digit zero, capital O, capital I, lowercase l. Why those four? Write them on a Post-it in bad handwriting. Walk away. Come back in five minutes. Try to tell them apart. You cannot. That is the entire problem Base58 was built to solve. Fifty-eight characters remain. A typical legacy address ends up twenty-six to thirty-five symbols long — short enough to copy by hand if you really have to.
SegWit activated in August 2017. With it came a second Bitcoin address format: Bech32, defined in BIP-173. Bech32 makes different bets. Case sensitivity disappears entirely — every address is lowercase. A different four characters are dropped: the digit 1, plus b, i, o. The thirty-two remaining letters and digits carry a built-in checksum. The checksum catches almost every single-character typo automatically. Taproot, live since November 2021, refined the format into Bech32m (BIP-350) after researchers found a corner-case flaw in the original maths.
Ethereum picked a third route. Just use hex. An Ethereum address is `0x` followed by exactly forty hexadecimal characters; forty-two characters total. Hex narrows alphanumeric all the way down to sixteen members. The digits 0–9, the letters a–f, nothing else. The choice felt clean in 2015. By 2026, after years of users squinting at raw hex blobs in MetaMask, it looks ugly. EIP-55 was the fix. Selectively uppercase certain letters in a pattern derived from a Keccak-256 hash of the lowercase address. The result is free typo detection. EIP-55 catches typos with around 99.975 percent reliability. The miss rate sits at roughly 0.0247 percent. Small. Not zero.
Hashes are the simplest case. A SHA-256 hash output is 256 bits, displayed as 64 hex characters. Ethereum's Keccak-256 produces output of identical length. A Bitcoin transaction ID — a txid — is the SHA-256 hash of the transaction itself, so a txid is also 64 alphanumeric hex characters. They look intimidating on a block explorer. They are pure alphanumeric.
Seed phrases break the pattern. BIP-39 wallet recovery is the one place where crypto steps outside alphanumeric and back into purely alphabetic territory. The standard encodes 128 or 256 bits of entropy as twelve or twenty-four English words drawn from a fixed 2,048-word list. Each word is lowercase letters only — no digits, no mixed case. Why? Because the design target is a person writing words on paper at 3am after their phone dies, and digits introduce ambiguity that letters do not.
| Identifier | Character set | Length | Example (truncated) |
|---|---|---|---|
| Bitcoin legacy address | Base58 (58 chars, no 0/O/I/l) | 26–35 | `1A1zP1eP5QGefi2…` |
| Bitcoin Bech32 (SegWit) | 32 lowercase, no 1/b/i/o | ~42 | `bc1qar0srrr7xfk…` |
| Ethereum address | hex (0–9, a–f) + `0x` | 42 | `0xde0B295669a91…` |
| SHA-256 / txid | hex | 64 | `e3b0c44298fc1c1…` |
| BIP-39 word | a–z only | 3–8 per word | `abandon ability able…` |
Every subset is a piece of human-centred design hiding inside a deeply technical system.
Alphanumeric Passwords: What NIST Actually Says in 2024
Most password advice on the public web is several years out of date. Use mixed case, add numbers, include at least one special character, change it every ninety days — those rules were canonical for two decades. The United States National Institute of Standards and Technology officially walked away from them.
NIST Special Publication 800-63B, the federal authority on digital identity guidance, finalised Revision 4 in September 2024. The new guidance is striking in how much it removes. The minimum length recommendation moved up to fifteen characters for single-factor authentication. The instruction on character composition rules is phrased as a "shall not": services shall not require specific character classes. Periodic password expiration, that ninety-day rotation everyone hated, was likewise removed. In place of those rules, NIST now requires services to screen submitted passwords against a blocklist of known-compromised credentials.
The shift is justified by entropy math. A 62-character alphanumeric pool produces about 5.95 bits per character. The full 95-character printable ASCII pool — alphanumeric plus specials — produces 6.57. Adding the full set of specials gains 0.62 bits per character. Adding one more character of length gains the full 5.95. Length dominates complexity by an order of magnitude.
Verizon's 2025 Data Breach Investigations Report found that credentials are 22 percent of all confirmed breach entry points. Credential stuffing — automated reuse of leaked password lists — runs at 19 percent of authentication attempts at the median, peaking at 44 percent. Forty-nine percent of users reuse passwords across services. None of those problems is solved by requiring a capital letter.
A longer alphanumeric password with no special characters is harder to crack than a short one assembled to satisfy a complexity checklist. If your bank still forces you to compose a twelve-character password with one uppercase letter, one digit, and one special character, that policy is now formally out of step with the US federal standard.
Where Else Alphanumeric Characters Appear
Step outside crypto and passwords and alphanumeric strings still surface wherever a system needs an identifier that humans can type and computers can parse without ambiguity.
Bank codes are an easy example. An IBAN can stretch to thirty-four alphanumeric characters and always opens with a two-letter ISO country code. SWIFT/BIC codes run eight or eleven. License plates vary by country — a UK plate is nothing like a German one — but both are alphanumeric subsets of the same sixty-two-symbol pool. Vehicle Identification Numbers are exactly seventeen characters worldwide, and VINs deliberately ban the letters I, O, and Q to keep them visually distinct from digits.
API keys are everyday examples most users never bother to look at. A Stripe live key opens `sk_live_` plus an alphanumeric token. An AWS access key opens `AKIA` plus sixteen alphanumeric characters. A GitHub personal access token issued after 2021 opens `ghp_`. Those prefixes are themselves alphanumeric, picked so providers can scan public repositories and logs for leaked keys. In a lot of cases that scan catches a slip before any attacker does.
QR codes deserve a brief mention. The ISO/IEC 18004 standard defines a dedicated "alphanumeric mode" that encodes a specific 45-character set — uppercase letters, digits, space, and a handful of punctuation marks — more efficiently than the general byte mode. A QR that contains only uppercase alphanumeric content stores about 1.6 times more data per square than the same content encoded as raw bytes.
Base32, Base58, Base64: When Crypto Picks a Subset
A handful of encoding standards exist specifically to map binary into an alphanumeric subset. The reference is RFC 4648, published by the IETF in 2006. It defines three encodings.
Hex is the simplest of them. Officially Base16. Sixteen characters: 0–9, a–f. Used for Ethereum addresses, cryptographic hashes, almost any low-level debugging where you need to read raw bytes. Base32 is the more interesting one. The 32-character alphabet was picked to be case-insensitive and, in some variants, to drop the visually confusable digits 0, 1, 8 and 9. Anyone who has set up two-factor authentication and typed a secret into Google Authenticator has typed Base32 — most of the time without knowing.
Base64 is the workhorse. Sixty-two alphanumerics plus the two symbols `+` and `/`. A URL-safe variant swaps those for `-` and `_`. Base64 is what carries your email attachments, encodes data-URLs inside HTML, and packages JSON Web Tokens for OAuth.
Bitcoin's Base58 sits outside RFC 4648. Satoshi Nakamoto built it independently. The target was different — humans re-typing addresses, not bytes-per-character efficiency — and the result was a custom alphabet that nobody else uses. Base85, sometimes called Ascii85, runs in the opposite direction. It packs four bytes into five characters and shows up in PDF and PostScript files, where the extra density justifies the readability loss.

Common Pitfalls: Visual Confusion and Look-Alikes
The same reasons crypto picks subsets of alphanumeric are the reasons everybody makes mistakes. A handful of character pairs cause most of the trouble.
The classic confusables: zero and capital O. Digit one, lowercase l, capital I. Lowercase l and uppercase I. Bitcoin Base58 drops all four because of this. Other systems use different mitigations — VINs drop I, O, and Q, some financial codes drop O entirely, and you can find national license-plate rules that ban whichever letter happens to look most like a 0 in the country's font.
A trickier and more recent problem is Unicode homograph attacks. The idea was documented in a 2001 paper by Evgeniy Gabrilovich and Alex Gontmakher. A homograph swaps a visually identical character from a different script — for example, the Cyrillic 'а' (U+0430) in place of the Latin 'a' (U+0061). Register a domain name with that substitution and you can host a phishing page that looks indistinguishable from the real bank. Modern browsers display the raw Punycode representation — something like `xn--80akhbyknj4f` — whenever a domain mixes scripts. That defence catches most attacks. Not all of them.
How to Build a Strong Alphanumeric Password in 2026
Three rules. All of them derived straight from the NIST math.
One: length beats character class. Aim for sixteen characters or more. The rule holds whether the system accepts only letters and digits or the full printable ASCII set.
Two: if you have to memorise it, use a passphrase. Four random words from a large list — the Diceware list is the canonical option — outperform almost any short password a human will sit down and invent.
Three: for everything else, use a password manager. Let the manager generate long alphanumeric strings you will never have to read or type by hand. Once a manager handles the output, the readability of the output stops mattering at all.
Quick Reference: Alphanumeric Counts and Examples
Sixty-two with case sensitivity. Thirty-six without it. ASCII codes 48–57 for digits. 65–90 for uppercase. 97–122 for lowercase. Match the whole set with the regex `[A-Za-z0-9]` or the POSIX class `[:alnum:]`. That sixty-two-symbol pool is what underlies nearly every digital identifier you will touch this week. Passwords. API keys. IBANs. License plates. Transaction IDs. Every wallet address you generate.