HomeCharset Encoding Converter

Charset Encoding Converter

Convert text from one character encoding to another, supporting UTF-8, GBK, GB2312, Big5, Shift_JIS, ISO-8859, Windows-1252 and other encoding formats, with auto detect encoding, batch file conversion, BOM processing and other features

Input Characters: 0Input Bytes: 0
Output Characters: 0Output Bytes: 0

Drag and drop files here or click to select

Supports multiple files (large files may affect performance)

Auto detect or manually select encoding for uploaded files



Documentation

About Charset Converter

This tool converts text/files across multiple encodings, with auto-detection, BOM handling, and flexible input/output formats.

Key Features

  • Multi-encoding Conversion: UTF, GB family, BIG5, Shift-JIS, EUC-KR, ISO/Windows encodings, etc.
  • Auto Detection: Detect source encoding automatically.
  • BOM Controls: Add or remove BOM when needed.
  • Format Conversion: Text, Base64, Hex, spaced Hex, and C-style Hex formats.
  • Text + File Modes: Works for both direct text and file workflows.

Steps

  1. Choose text mode or file mode.
  2. Set source/target encoding (or auto-detect source).
  3. Configure I/O format and BOM options.
  4. Run conversion.
  5. Review output and byte stats, then copy/download if needed.

Use Cases

  • Encoding normalization in legacy migrations.
  • Preparing multilingual data import/export.
  • Diagnosing mojibake and byte-level encoding issues.

FAQ

Why is output garbled?

Source encoding may be incorrect, or input format may not match actual data format.

When should I keep BOM?

Some editors/platforms rely on BOM for UTF detection. Follow target platform requirements.

Supported Encoding Reference

This tool supports 30+ character encodings, covering major languages and regions worldwide. Below is a detailed reference for each supported encoding.

Unicode Encodings

EncodingDescriptionByte RangeSpecification
UTF-8Variable-length Unicode encoding, the most widely used encoding on the Web. Backward compatible with ASCII.1-4 bytesRFC 3629
UTF-16 LEUTF-16 Little Endian, commonly used in Windows systems. Each character uses 2 or 4 bytes.2/4 bytesRFC 2781
UTF-16 BEUTF-16 Big Endian, used in some network protocols and Java. Each character uses 2 or 4 bytes.2/4 bytesRFC 2781

Chinese Encodings

EncodingDescriptionUsageSpecification
GBKExtension of GB2312, supports 21,003 Chinese characters, including traditional characters. Commonly used in Simplified Chinese Windows.Simplified Chinese Windows, old websitesIANA GBK
GB2312Original Chinese National Standard (1980), supports 6,763 simplified Chinese characters and 682 symbols.Old systems, emailGB 2312-1980
GB18030Latest Chinese National Standard, mandatory in China. Supports all Unicode characters, including minority languages.Modern Chinese systems, government documentsGB 18030-2005
Big5Traditional Chinese encoding, mainly used in Taiwan and Hong Kong. Contains 13,060 traditional Chinese characters.Taiwan, Hong Kong websitesIANA Charset

Japanese Encodings

EncodingDescriptionUsageSpecification
Shift_JISMicrosoft's Japanese encoding, supports JIS X 0201 and JIS X 0208 character sets.Windows, old websites, gamesIANA Charset
EUC-JPJapanese Extended Unix Encoding, variable-length encoding, compatible with ASCII.Unix/Linux systems, old websitesIANA Charset
ISO-2022-JP7-bit Japanese encoding using escape sequences. Also known as JIS encoding.Japanese email, old systemsRFC 1468

Korean Encodings

EncodingDescriptionUsageSpecification
EUC-KRKorean Extended Unix Encoding, based on KS X 1001 standard. Supports 8,822 Korean characters (Hangul + Chinese characters).Korean websites, old systemsRFC 1557

Western European Encodings

EncodingDescriptionLanguagesSpecification
ISO-8859-1Also known as Latin-1, the first part of the ISO-8859 series. Covers 191 Western European language characters.English, French, German, Spanish, Portuguese, ItalianISO/IEC 8859-1
ISO-8859-15Latin-9, adds Euro symbol (€) and additional French/Finnish characters on top of Latin-1.Western European languages with Euro symbolISO/IEC 8859-15
Windows-1252Microsoft's extension of Latin-1, adds typographic characters such as curly quotes and dashes.Western European languages on WindowsUnicode.org

Cyrillic Encodings

EncodingDescriptionLanguagesSpecification
Windows-1251Microsoft's Windows Cyrillic encoding, supports Russian and other Cyrillic languages.Russian, Ukrainian, Bulgarian, SerbianUnicode.org
KOI8-R8-bit Cyrillic encoding designed for Russian. Characters can be read even with high bit removed.RussianRFC 1489
ISO-8859-5ISO standard Cyrillic encoding, part of the ISO-8859 series. Supports basic Cyrillic characters.Russian, Bulgarian, Macedonian, SerbianISO/IEC 8859-5

Other Encodings

EncodingDescriptionUsageSpecification
ASCIIAmerican Standard Code for Information Interchange, the foundation of most modern encodings. 7-bit encoding containing 128 characters.Basic English text, programmingRFC 20
MacintoshOriginal character encoding designed by Apple for Mac OS Classic, also known as Mac Roman.Old Mac files, old Mac applicationsUnicode.org