About Charset Converter
This tool converts text/files across multiple encodings, with auto-detection, BOM handling, and flexible input/output formats.
Key Features
- Multi-encoding Conversion: UTF, GB family, BIG5, Shift-JIS, EUC-KR, ISO/Windows encodings, etc.
- Auto Detection: Detect source encoding automatically.
- BOM Controls: Add or remove BOM when needed.
- Format Conversion: Text, Base64, Hex, spaced Hex, and C-style Hex formats.
- Text + File Modes: Works for both direct text and file workflows.
Steps
- Choose text mode or file mode.
- Set source/target encoding (or auto-detect source).
- Configure I/O format and BOM options.
- Run conversion.
- Review output and byte stats, then copy/download if needed.
Use Cases
- Encoding normalization in legacy migrations.
- Preparing multilingual data import/export.
- Diagnosing mojibake and byte-level encoding issues.
Supported Encoding Reference
Unicode
| Encoding | Description | Byte Range | Spec |
|---|---|---|---|
| UTF-8 | Variable-length Unicode encoding and the most widely used encoding on the Web. Backward compatible with ASCII. | 1-4 bytes | RFC 3629 |
| UTF-16 LE | Little-endian UTF-16, commonly used on Windows. | 2/4 bytes | RFC 2781 |
| UTF-16 BE | Big-endian UTF-16, common in some protocols and Java-related workflows. | 2/4 bytes | RFC 2781 |
Chinese Encodings
| Encoding | Description | Typical Usage | Spec |
|---|---|---|---|
| GBK | Extended GB2312 encoding with broader Simplified/Traditional Chinese coverage. | Simplified Chinese Windows, legacy websites | IANA GBK |
| GB2312 | Early Simplified Chinese national standard. | Legacy systems, email | GB 2312-1980 |
| GB18030 | Modern Chinese national standard with full Unicode coverage. | Modern Chinese systems, government documents | GB 18030-2005 |
| Big5 | Traditional Chinese encoding. | Taiwan and Hong Kong websites | IANA Charset |
Japanese Encodings
| Encoding | Description | Typical Usage | Spec |
|---|---|---|---|
| Shift_JIS | Common Microsoft-oriented Japanese encoding. | Windows, legacy websites, games | IANA Charset |
| EUC-JP | Extended Unix Code for Japanese. | Unix/Linux systems, legacy websites | IANA Charset |
| ISO-2022-JP | 7-bit Japanese encoding that uses escape sequences. | Japanese email, legacy systems | RFC 1468 |
Korean Encodings
| Encoding | Description | Typical Usage | Spec |
|---|---|---|---|
| EUC-KR | Extended Unix Code for Korean based on KS X 1001. | Korean websites, legacy systems | RFC 1557 |
Western Encodings
| Encoding | Description | Languages | Spec |
|---|---|---|---|
| ISO-8859-1 | Latin-1 covering common Western European characters. | English, French, German, Spanish, Portuguese, Italian | ISO/IEC 8859-1 |
| ISO-8859-15 | Latin-9 with euro sign and extra Western European characters. | Western European languages using the euro sign | ISO/IEC 8859-15 |
| Windows-1252 | Microsoft extension of Latin-1. | Western European languages on Windows | Unicode.org |
Cyrillic Encodings
| Encoding | Description | Languages | Spec |
|---|---|---|---|
| Windows-1251 | Microsoft Cyrillic encoding. | Russian, Ukrainian, Bulgarian, Serbian | Unicode.org |
| KOI8-R | Classic 8-bit encoding designed for Russian. | Russian | RFC 1489 |
| ISO-8859-5 | ISO standard Cyrillic encoding. | Russian, Bulgarian, Macedonian, Serbian | ISO/IEC 8859-5 |
Other Encodings
| Encoding | Description | Typical Usage | Spec |
|---|---|---|---|
| ASCII | Foundational 7-bit encoding used by many modern encodings. | Basic English text, programming | RFC 20 |
| Macintosh | Legacy Mac Roman encoding from classic Mac OS. | Old Mac files, legacy Mac applications | Unicode.org |
FAQ
Why is output garbled?
Source encoding may be incorrect, or input format may not match actual data format.
When should I keep BOM?
Some editors/platforms rely on BOM for UTF detection. Follow target platform requirements.