Base64 and Unicode, explained for everyday debugging
What Base64 is, why Chinese text sometimes breaks, and how to encode or decode it without losing characters.
Base64 looks like a secret code, but it is really just a transport format. It turns bytes into plain ASCII text so the value can safely travel through JSON, environment variables, URLs, email bodies, and logs.
That distinction matters: Base64 is not encryption, and it does not hide sensitive information. Anyone can decode it.
The common mistake
Most broken Base64 snippets are not broken because of Base64 itself. They break one step earlier: the text was converted to bytes with the wrong character encoding.
For English-only text, this can stay invisible for a long time. For Chinese, emoji, accents, or mixed-language content, it appears quickly.
This text:
Hello yueyekidl 你好
must first become UTF-8 bytes. Then those bytes can be encoded as Base64:
SGVsbG8geXVleWVraWRsIOS9oOWlvQ==
Decode it as UTF-8 and the original text comes back. Decode it with the wrong assumption and you get scrambled characters.
When Base64 is useful
Base64 is handy when a system wants a text-only value but you need to move bytes through it:
- A small payload inside a JSON response.
- A token-like value copied into a config file.
- A short binary value included in a log or support ticket.
- A string that needs to survive systems that dislike raw Unicode.
It is not a good choice for hiding secrets, compressing large content, or storing files in places that should really use object storage.
A safer debugging habit
When a Base64 value fails, check these four things in order:
- Is the text valid Base64? Extra spaces or copied punctuation can break it.
- Was the original content text or binary?
- If it was text, was it encoded as UTF-8?
- Are you treating decoded bytes as text only after decoding?
That last step is where many bugs hide. Bytes first, text second.
Try it locally
Use the Base64 encoder / decoder to test a value without sending it to a server. Paste Unicode text, encode it, switch to decode, and confirm the round trip returns the same text.
If it round-trips locally but fails in another system, the bug is probably in that system's encoding assumptions.