Character Encodings

Character Encodings
Prev	Chapter 3. Working With Files	Next

An encoding specifies a way of storing characters on disk. jEdit can use any encoding supported by the Java platform. The current buffer's encoding is shown in the status bar.

The default encoding, used to load and save files for which no other encoding is specified, can be set in the Loading and Saving pane of the Utilities>Global Options dialog box.

Unless you change the default encoding, jEdit will use your operating system's native default; MacRoman on the MacOS, Cp1252 on Windows, and 8859_1 on Unix.

To open a file stored using an encoding other than the default, select the encoding from the Commands>Encoding menu of the file system browser before opening the file.

The encoding to use when saving a specific buffer can be set in the Utilities>Buffer Options dialog box.

If a file is opened without an explicit encoding specified and it appears in the recent file list, jEdit will use the encoding last used when working with that file; otherwise the default encoding will be used.

Unfortunately, there is no way to obtain a list of all supported encodings using the Java APIs, so jEdit only lists a few of the most common encodings; however, any other supported encoding name can be typed in.

Commonly Used Encodings

The most frequently-used character encoding is ASCII, or “American Standard Code for Information Interchange”. ASCII encodes Latin letters used in English, in addition to numbers and a range of punctuation characters. The ASCII character set consists of 127 characters, and it is unsuitable for anything but English text (and other file types which only use English characters, like most program source). jEdit will load and save files as ASCII if the ASCII encoding is used.

Because ASCII is unsuitable for international use, most operating systems use an 8-bit extension of ASCII, with the first 127 characters remaining the same, and the rest used to encode accents, umlauts, and various less frequently used typographical marks. The three major operating systems all extend ASCII in a different way. Files written by Macintosh programs can be read using the MacRoman encoding; Windows text files are usually stored as Cp1252. In the Unix world, the 8859_1 character encoding has found widespread usage.

On Windows, various other encodings, which are known as code pages and are identified by number, are used to store non-English text. The corresponding Java encoding name is Cp followed by the code page number.

Many common cross-platform international character sets are also supported; KOI8_R for Russian text, Big5 and GBK for Chinese, and SJIS for Japanese.

16-bit Unicode files are automatically detected as such when opened, regardless of the encoding specified by the user. The closely-related UTF8 encoding, which uses variable-length characters, is also supported, however UTF8 files are not auto-detected.

Prev	Up	Next
Line Separators	Home	The File System Browser