Documentation

About Charset Converter

This tool converts text/files across multiple encodings, with auto-detection, BOM handling, and flexible input/output formats.

Key Features

  • Multi-encoding Conversion: UTF, GB family, BIG5, Shift-JIS, EUC-KR, ISO/Windows encodings, etc.
  • Auto Detection: Detect source encoding automatically.
  • BOM Controls: Add or remove BOM when needed.
  • Format Conversion: Text, Base64, Hex, spaced Hex, and C-style Hex formats.
  • Text + File Modes: Works for both direct text and file workflows.

Steps

  1. Choose text mode or file mode.
  2. Set source/target encoding (or auto-detect source).
  3. Configure I/O format and BOM options.
  4. Run conversion.
  5. Review output and byte stats, then copy/download if needed.

Use Cases

  • Encoding normalization in legacy migrations.
  • Preparing multilingual data import/export.
  • Diagnosing mojibake and byte-level encoding issues.

Supported Encoding Reference

Unicode

EncodingDescriptionByte RangeSpec
UTF-8Variable-length Unicode encoding and the most widely used encoding on the Web. Backward compatible with ASCII.1-4 bytesRFC 3629
UTF-16 LELittle-endian UTF-16, commonly used on Windows.2/4 bytesRFC 2781
UTF-16 BEBig-endian UTF-16, common in some protocols and Java-related workflows.2/4 bytesRFC 2781

Chinese Encodings

EncodingDescriptionTypical UsageSpec
GBKExtended GB2312 encoding with broader Simplified/Traditional Chinese coverage.Simplified Chinese Windows, legacy websitesIANA GBK
GB2312Early Simplified Chinese national standard.Legacy systems, emailGB 2312-1980
GB18030Modern Chinese national standard with full Unicode coverage.Modern Chinese systems, government documentsGB 18030-2005
Big5Traditional Chinese encoding.Taiwan and Hong Kong websitesIANA Charset

Japanese Encodings

EncodingDescriptionTypical UsageSpec
Shift_JISCommon Microsoft-oriented Japanese encoding.Windows, legacy websites, gamesIANA Charset
EUC-JPExtended Unix Code for Japanese.Unix/Linux systems, legacy websitesIANA Charset
ISO-2022-JP7-bit Japanese encoding that uses escape sequences.Japanese email, legacy systemsRFC 1468

Korean Encodings

EncodingDescriptionTypical UsageSpec
EUC-KRExtended Unix Code for Korean based on KS X 1001.Korean websites, legacy systemsRFC 1557

Western Encodings

EncodingDescriptionLanguagesSpec
ISO-8859-1Latin-1 covering common Western European characters.English, French, German, Spanish, Portuguese, ItalianISO/IEC 8859-1
ISO-8859-15Latin-9 with euro sign and extra Western European characters.Western European languages using the euro signISO/IEC 8859-15
Windows-1252Microsoft extension of Latin-1.Western European languages on WindowsUnicode.org

Cyrillic Encodings

EncodingDescriptionLanguagesSpec
Windows-1251Microsoft Cyrillic encoding.Russian, Ukrainian, Bulgarian, SerbianUnicode.org
KOI8-RClassic 8-bit encoding designed for Russian.RussianRFC 1489
ISO-8859-5ISO standard Cyrillic encoding.Russian, Bulgarian, Macedonian, SerbianISO/IEC 8859-5

Other Encodings

EncodingDescriptionTypical UsageSpec
ASCIIFoundational 7-bit encoding used by many modern encodings.Basic English text, programmingRFC 20
MacintoshLegacy Mac Roman encoding from classic Mac OS.Old Mac files, legacy Mac applicationsUnicode.org

FAQ

Why is output garbled?

Source encoding may be incorrect, or input format may not match actual data format.

When should I keep BOM?

Some editors/platforms rely on BOM for UTF detection. Follow target platform requirements.