Charset Encoding Converter
Convert text from one character encoding to another, supporting UTF-8, GBK, GB2312, Big5, Shift_JIS, ISO-8859, Windows-1252 and other encoding formats, with auto detect encoding, batch file conversion, BOM processing and other features
Tool Introduction
What is Character Encoding?
Character encoding is a system that maps characters to numbers (code points), and then maps to bytes. Different encodings use different mapping rules, which is why opening text with the wrong encoding results in garbled text. Choosing the correct encoding is crucial for correctly displaying and processing multilingual text.
Important Notice
This tool is used to convert text from one encoding to another, for example, convert UTF-8 encoded text to GBK encoding.
This tool cannot fix text that is already displayed as garbled. If the text you see is already garbled, it means the text was read using the wrong encoding, and you need to re-read the original file with the correct encoding.
Feature Overview
Charset encoding converter tool converts text from one character encoding to another, supporting UTF-8, GBK, GB2312, Big5, Shift_JIS, ISO-8859, Windows-1252 and other encoding formats, with auto detect encoding, batch file conversion, BOM processing and other features. Suitable for database migration, file encoding conversion, web development, cross-platform file sharing and other scenarios.
Typical Scenarios
- Database Migration: When migrating data between different database systems or servers, use this tool to ensure character encoding consistency and prevent data corruption.
- File Encoding Conversion: Convert text files from one encoding to another, for example, convert GBK encoded files to UTF-8, or UTF-8 to Big5.
- Web Development: Convert old web pages to UTF-8 encoding to ensure correct display in modern browsers and different platforms.
- Cross-Platform File Sharing: Convert files between Windows (GBK), macOS and Linux systems to ensure text displays correctly on all platforms.
Usage Tips & Best Practices
- Auto Detect Encoding: Use "Auto Detect" function when unsure about source encoding, detection accuracy is high for most languages.
- BOM Processing: When creating UTF-8/UTF-16 files for Windows applications that need it, you can add BOM (Byte Order Mark).
- Batch File Conversion: Use "File Conversion" tab to process multiple files at once, improving work efficiency.
- Data Security: All processing is done locally in browser, no data is uploaded to server, completely protecting user privacy.
How to Use This Tool
Text Conversion Mode
- Click the "Text Conversion" tab to enter text conversion mode
- Select source encoding from the dropdown menu, or use "Auto Detect" to automatically identify encoding
- Select target encoding (default is UTF-8, the most universal encoding format)
- Select input/output format: plain text, Base64, Hex or C/C++ array format
- Enter or paste text, click "Convert" button. Use "Copy" to copy result or "Download" to save as file
File Conversion Mode
- Click the "File Conversion" tab to enter file mode
- Drag and drop files to the upload area, or click to select files (supports multiple files)
- The system will automatically detect the encoding of each file, displayed in the "Source Encoding" column. Can be manually modified if needed
- Select target encoding for all files
- Click "Convert All" to convert, then click "Download All" to save converted files
Supported Input/Output Formats
- Plain Text - Regular text content, enter or paste directly
- Base64 - Base64 encoded string, commonly used for email attachments and Data URL
- Hex - Continuous hexadecimal bytes, e.g. 48656C6C6F
- Hex with Spaces - Space-separated hexadecimal bytes, e.g. 48 65 6C 6C 6F
- C/C++ Array - C/C++ style byte array format, e.g. 0x48,0x65,0x6C,0x6C,0x6F
Common Use Cases
Database Migration
When migrating data between different database systems or servers, use this tool to ensure character encoding consistency and prevent data corruption.
File Encoding Conversion
Convert text files from one encoding to another, for example, convert GBK encoded files to UTF-8, or UTF-8 to Big5.
Web Development
Convert old web pages to UTF-8 encoding to ensure correct display in modern browsers and different platforms.
Cross-Platform File Sharing
Convert files between Windows (GBK), macOS and Linux systems to ensure text displays correctly on all platforms.
Usage Tips & Best Practices
- Use "Auto Detect" function when unsure about source encoding - detection accuracy is high for most languages
- Enable "Show Hex" to view actual byte values, helpful for debugging encoding issues
- Add BOM (Byte Order Mark) when creating UTF-8/UTF-16 files for Windows applications that need it
- Use "File Conversion" tab for batch file conversion, supports processing multiple files at once
- When converting encoding, some characters may not exist in the target encoding and will be replaced with "?" or similar placeholders
Frequently Asked Questions
How to choose the appropriate encoding?
Choose according to text language and usage: Simplified Chinese use GBK or UTF-8, Traditional Chinese use Big5, Japanese use Shift_JIS or UTF-8, Korean use EUC-KR or UTF-8, English and Western European languages use UTF-8 or ISO-8859-1.
Is auto detect encoding accurate?
Auto detect has high accuracy for most common languages, but may not be accurate enough for mixed encodings or special characters. If unsure, it is recommended to manually select encoding.
What to do if garbled text appears after conversion?
If garbled text appears after conversion, the source encoding may be incorrectly selected. Please try using "Auto Detect" function, or manually try different source encodings.
How about data security?
All processing is done locally in browser, no data is uploaded to server, completely protecting user privacy.
Supported Encoding Reference
This tool supports 30+ character encodings, covering major languages and regions worldwide. Below is a detailed reference for each supported encoding.
Unicode Encodings
| Encoding | Description | Byte Range | Specification |
|---|---|---|---|
| UTF-8 | Variable-length Unicode encoding, the most widely used encoding on the Web. Backward compatible with ASCII. | 1-4 bytes | RFC 3629 |
| UTF-16 LE | UTF-16 Little Endian, commonly used in Windows systems. Each character uses 2 or 4 bytes. | 2/4 bytes | RFC 2781 |
| UTF-16 BE | UTF-16 Big Endian, used in some network protocols and Java. Each character uses 2 or 4 bytes. | 2/4 bytes | RFC 2781 |
Chinese Encodings
| Encoding | Description | Usage | Specification |
|---|---|---|---|
| GBK | Extension of GB2312, supports 21,003 Chinese characters, including traditional characters. Commonly used in Simplified Chinese Windows. | Simplified Chinese Windows, old websites | IANA GBK |
| GB2312 | Original Chinese National Standard (1980), supports 6,763 simplified Chinese characters and 682 symbols. | Old systems, email | GB 2312-1980 |
| GB18030 | Latest Chinese National Standard, mandatory in China. Supports all Unicode characters, including minority languages. | Modern Chinese systems, government documents | GB 18030-2005 |
| Big5 | Traditional Chinese encoding, mainly used in Taiwan and Hong Kong. Contains 13,060 traditional Chinese characters. | Taiwan, Hong Kong websites | IANA Charset |
Japanese Encodings
| Encoding | Description | Usage | Specification |
|---|---|---|---|
| Shift_JIS | Microsoft's Japanese encoding, supports JIS X 0201 and JIS X 0208 character sets. | Windows, old websites, games | IANA Charset |
| EUC-JP | Japanese Extended Unix Encoding, variable-length encoding, compatible with ASCII. | Unix/Linux systems, old websites | IANA Charset |
| ISO-2022-JP | 7-bit Japanese encoding using escape sequences. Also known as JIS encoding. | Japanese email, old systems | RFC 1468 |
Korean Encodings
| Encoding | Description | Usage | Specification |
|---|---|---|---|
| EUC-KR | Korean Extended Unix Encoding, based on KS X 1001 standard. Supports 8,822 Korean characters (Hangul + Chinese characters). | Korean websites, old systems | RFC 1557 |
Western European Encodings
| Encoding | Description | Languages | Specification |
|---|---|---|---|
| ISO-8859-1 | Also known as Latin-1, the first part of the ISO-8859 series. Covers 191 Western European language characters. | English, French, German, Spanish, Portuguese, Italian | ISO/IEC 8859-1 |
| ISO-8859-15 | Latin-9, adds Euro symbol (€) and additional French/Finnish characters on top of Latin-1. | Western European languages with Euro symbol | ISO/IEC 8859-15 |
| Windows-1252 | Microsoft's extension of Latin-1, adds typographic characters such as curly quotes and dashes. | Western European languages on Windows | Unicode.org |
Cyrillic Encodings
| Encoding | Description | Languages | Specification |
|---|---|---|---|
| Windows-1251 | Microsoft's Windows Cyrillic encoding, supports Russian and other Cyrillic languages. | Russian, Ukrainian, Bulgarian, Serbian | Unicode.org |
| KOI8-R | 8-bit Cyrillic encoding designed for Russian. Characters can be read even with high bit removed. | Russian | RFC 1489 |
| ISO-8859-5 | ISO standard Cyrillic encoding, part of the ISO-8859 series. Supports basic Cyrillic characters. | Russian, Bulgarian, Macedonian, Serbian | ISO/IEC 8859-5 |
Other Encodings
| Encoding | Description | Usage | Specification |
|---|---|---|---|
| ASCII | American Standard Code for Information Interchange, the foundation of most modern encodings. 7-bit encoding containing 128 characters. | Basic English text, programming | RFC 20 |
| Macintosh | Original character encoding designed by Apple for Mac OS Classic, also known as Mac Roman. | Old Mac files, old Mac applications | Unicode.org |