One day, I wanted to make a script to convert some text files containing
romaji to EUC-JP, and I wasn't happy with the tools available (even though
everyone and their brother has made such a script). Unfortunately, the only
way I could find a "listing" of what values corresponded to what kana was to
read someone's code, and finding a concise description of EUC-JP is nigh on
impossible. Because of that, I made this table.
The EUC-JP characters below are two bytes long. The columns represent
the first byte, and the rows the second byte. These characters are all
non-kanji. Note that both bytes are between a1 and
fe (inclusive). There are also some undefined characters
within this range (shown in TH cells with a red background below).
For more more information on EUC-JP, see below.
Note that column a1 contains punctuation, a4 hiragana,
and a5 katakana. Equivalent kana have the same second byte. Except
for some katakana only characters, the characters are grouped by the little version,
the normal version, and then the ¡« and ¡¬ versions (as they exist).
Also note that a1 a1 is defined; it's a space.
| a1 | a2 | a3 | a4 | a5 | a6 | a7 | a8 | |
|---|---|---|---|---|---|---|---|---|
| a1 | ¡¡ | ¢¡ | ¤¡ | ¥¡ | ¦¡ | §¡ | ¨¡ | |
| a2 | ¡¢ | ¢¢ | ¤¢ | ¥¢ | ¦¢ | §¢ | ¨¢ | |
| a3 | ¡£ | ¢£ | ¤£ | ¥£ | ¦£ | §£ | ¨£ | |
| a4 | ¡¤ | ¢¤ | ¤¤ | ¥¤ | ¦¤ | §¤ | ¨¤ | |
| a5 | ¡¥ | ¢¥ | ¤¥ | ¥¥ | ¦¥ | §¥ | ¨¥ | |
| a6 | ¡¦ | ¢¦ | ¤¦ | ¥¦ | ¦¦ | §¦ | ¨¦ | |
| a7 | ¡§ | ¢§ | ¤§ | ¥§ | ¦§ | §§ | ¨§ | |
| a8 | ¡¨ | ¢¨ | ¤¨ | ¥¨ | ¦¨ | §¨ | ¨¨ | |
| a9 | ¡© | ¢© | ¤© | ¥© | ¦© | §© | ¨© | |
| aa | ¡ª | ¢ª | ¤ª | ¥ª | ¦ª | §ª | ¨ª | |
| ab | ¡« | ¢« | ¤« | ¥« | ¦« | §« | ¨« | |
| ac | ¡¬ | ¢¬ | ¤¬ | ¥¬ | ¦¬ | §¬ | ¨¬ | |
| ad | ¡ | ¢ | ¤ | ¥ | ¦ | § | ¨ | |
| ae | ¡® | ¢® | ¤® | ¥® | ¦® | §® | ¨® | |
| af | ¡¯ | ¤¯ | ¥¯ | ¦¯ | §¯ | ¨¯ | ||
| b0 | ¡° | £° | ¤° | ¥° | ¦° | §° | ¨° | |
| b1 | ¡± | £± | ¤± | ¥± | ¦± | §± | ¨± | |
| b2 | ¡² | £² | ¤² | ¥² | ¦² | §² | ¨² | |
| b3 | ¡³ | £³ | ¤³ | ¥³ | ¦³ | §³ | ¨³ | |
| b4 | ¡´ | £´ | ¤´ | ¥´ | ¦´ | §´ | ¨´ | |
| b5 | ¡µ | £µ | ¤µ | ¥µ | ¦µ | §µ | ¨µ | |
| b6 | ¡¶ | £¶ | ¤¶ | ¥¶ | ¦¶ | §¶ | ¨¶ | |
| b7 | ¡· | £· | ¤· | ¥· | ¦· | §· | ¨· | |
| b8 | ¡¸ | £¸ | ¤¸ | ¥¸ | ¦¸ | §¸ | ¨¸ | |
| b9 | ¡¹ | £¹ | ¤¹ | ¥¹ | §¹ | ¨¹ | ||
| ba | ¡º | ¢º | ¤º | ¥º | §º | ¨º | ||
| bb | ¡» | ¢» | ¤» | ¥» | §» | ¨» | ||
| bc | ¡¼ | ¢¼ | ¤¼ | ¥¼ | §¼ | ¨¼ | ||
| bd | ¡½ | ¢½ | ¤½ | ¥½ | §½ | ¨½ | ||
| be | ¡¾ | ¢¾ | ¤¾ | ¥¾ | §¾ | ¨¾ | ||
| bf | ¡¿ | ¢¿ | ¤¿ | ¥¿ | §¿ | ¨¿ | ||
| c0 | ¡À | ¢À | ¤À | ¥À | §À | ¨À | ||
| c1 | ¡Á | ¢Á | £Á | ¤Á | ¥Á | ¦Á | §Á | |
| c2 | ¡Â | £Â | ¤Â | ¥Â | ¦Â | |||
| c3 | ¡Ã | £Ã | ¤Ã | ¥Ã | ¦Ã | |||
| c4 | ¡Ä | £Ä | ¤Ä | ¥Ä | ¦Ä | |||
| c5 | ¡Å | £Å | ¤Å | ¥Å | ¦Å | |||
| c6 | ¡Æ | £Æ | ¤Æ | ¥Æ | ¦Æ | |||
| c7 | ¡Ç | £Ç | ¤Ç | ¥Ç | ¦Ç | |||
| c8 | ¡È | £È | ¤È | ¥È | ¦È | |||
| c9 | ¡É | £É | ¤É | ¥É | ¦É | |||
| ca | ¡Ê | ¢Ê | £Ê | ¤Ê | ¥Ê | ¦Ê | ||
| cb | ¡Ë | ¢Ë | £Ë | ¤Ë | ¥Ë | ¦Ë | ||
| cc | ¡Ì | ¢Ì | £Ì | ¤Ì | ¥Ì | ¦Ì | ||
| cd | ¡Í | ¢Í | £Í | ¤Í | ¥Í | ¦Í | ||
| ce | ¡Î | ¢Î | £Î | ¤Î | ¥Î | ¦Î | ||
| cf | ¡Ï | ¢Ï | £Ï | ¤Ï | ¥Ï | ¦Ï | ||
| d0 | ¡Ð | ¢Ð | £Ð | ¤Ð | ¥Ð | ¦Ð | ||
| d1 | ¡Ñ | £Ñ | ¤Ñ | ¥Ñ | ¦Ñ | §Ñ | ||
| d2 | ¡Ò | £Ò | ¤Ò | ¥Ò | ¦Ò | §Ò | ||
| d3 | ¡Ó | £Ó | ¤Ó | ¥Ó | ¦Ó | §Ó | ||
| d4 | ¡Ô | £Ô | ¤Ô | ¥Ô | ¦Ô | §Ô | ||
| d5 | ¡Õ | £Õ | ¤Õ | ¥Õ | ¦Õ | §Õ | ||
| d6 | ¡Ö | £Ö | ¤Ö | ¥Ö | ¦Ö | §Ö | ||
| d7 | ¡× | £× | ¤× | ¥× | ¦× | §× | ||
| d8 | ¡Ø | £Ø | ¤Ø | ¥Ø | ¦Ø | §Ø | ||
| d9 | ¡Ù | £Ù | ¤Ù | ¥Ù | §Ù | |||
| da | ¡Ú | £Ú | ¤Ú | ¥Ú | §Ú | |||
| db | ¡Û | ¤Û | ¥Û | §Û | ||||
| dc | ¡Ü | ¢Ü | ¤Ü | ¥Ü | §Ü | |||
| dd | ¡Ý | ¢Ý | ¤Ý | ¥Ý | §Ý | |||
| de | ¡Þ | ¢Þ | ¤Þ | ¥Þ | §Þ | |||
| df | ¡ß | ¢ß | ¤ß | ¥ß | §ß | |||
| e0 | ¡à | ¢à | ¤à | ¥à | §à | |||
| e1 | ¡á | ¢á | £á | ¤á | ¥á | §á | ||
| e2 | ¡â | ¢â | £â | ¤â | ¥â | §â | ||
| e3 | ¡ã | ¢ã | £ã | ¤ã | ¥ã | §ã | ||
| e4 | ¡ä | ¢ä | £ä | ¤ä | ¥ä | §ä | ||
| e5 | ¡å | ¢å | £å | ¤å | ¥å | §å | ||
| e6 | ¡æ | ¢æ | £æ | ¤æ | ¥æ | §æ | ||
| e7 | ¡ç | ¢ç | £ç | ¤ç | ¥ç | §ç | ||
| e8 | ¡è | ¢è | £è | ¤è | ¥è | §è | ||
| e9 | ¡é | ¢é | £é | ¤é | ¥é | §é | ||
| ea | ¡ê | ¢ê | £ê | ¤ê | ¥ê | §ê | ||
| eb | ¡ë | £ë | ¤ë | ¥ë | §ë | |||
| ec | ¡ì | £ì | ¤ì | ¥ì | §ì | |||
| ed | ¡í | £í | ¤í | ¥í | §í | |||
| ee | ¡î | £î | ¤î | ¥î | §î | |||
| ef | ¡ï | £ï | ¤ï | ¥ï | §ï | |||
| f0 | ¡ð | £ð | ¤ð | ¥ð | §ð | |||
| f1 | ¡ñ | £ñ | ¤ñ | ¥ñ | §ñ | |||
| f2 | ¡ò | ¢ò | £ò | ¤ò | ¥ò | |||
| f3 | ¡ó | ¢ó | £ó | ¤ó | ¥ó | |||
| f4 | ¡ô | ¢ô | £ô | ¥ô | ||||
| f5 | ¡õ | ¢õ | £õ | ¥õ | ||||
| f6 | ¡ö | ¢ö | £ö | ¥ö | ||||
| f7 | ¡÷ | ¢÷ | £÷ | |||||
| f8 | ¡ø | ¢ø | £ø | |||||
| f9 | ¡ù | ¢ù | £ù | |||||
| fa | ¡ú | £ú | ||||||
| fb | ¡û | |||||||
| fc | ¡ü | |||||||
| fd | ¡ý | |||||||
| fe | ¡þ | ¢þ |
The JP entities of EUC-JP are two bytes long. Both bytes must be greater
than a0, but not ff. There are combinations
within these ranges which are undefined (shown in TH cells
with a red background above). Note that both bytes have the most
significant bit (MSB) set; if a character does not have the MSB set, it
is interpreted as a normal ASCII character.
After the range shown above, there is a long section of undefined
characters, followed by kanji starting at b0 a1.
There is a gap from cf d4 to cf fe.
The last kanji is at f4 a6. f5 a1
through fe fe are intended for "user" defined characters.
I'm glazing over a lot of other details (e.g., "shifting" into extra character sets).
But good luck finding coherant documentation. Here're a few places to start:
ibiblio ISO 2022,
ibiblio EUC,
ibiblio JIS,
HP.
The 94x94 code they speak of is the same as the
a1 through fe range; row 01
corresponds to column a1 above, and so on. A byte in this range is sometimes
referred to as "GR", while bytes from 21 to 7e
are called "GL". (These can mean different things in different character sets,
but chances are, you're just trying to understand EUC-JP -- not every other ISO 2022 character set
in existence).
A better summary can be found
here, although
note that it's a bit dated. Section 2.2 especially is useful for getting a grasp on EUC-JP. Note that
code set 3 (aka G3) is no longer all user defined; it contains JIS X 0212-1990.
From what I can tell, JIS X 0212-1990 contains mostly rare kanji. And by the way, no one will actually
tell you what is in any of these character sets, presumedly because they are owned and sold by the
Japanese Standards Association. I guess the
idea is EUC-JP interpreters will just want to know how to take a set of bytes and figure out the
character set and offset therein, and only font creators and the like will actually care what the
resulting characters are.
Here's a summary table of EUC-JP. If a byte sequence isn't listed, it's undefined (or at least,
some of the bytes are undefined). As is noted above, even these ranges have undefined characters
in them as well.
| First Byte | Second Byte | Third Byte | Code Set | Description |
|---|---|---|---|---|
| 00 - 7F | US-ASCII | ASCII (half-width) characters | ||
| 8E | A1 - DF | JIS X0201-1976 | Half-width katakana, etc. | |
| 8F | A1 - FE | A1 - FE | JIS X0210-1990 | Rare kanji (esp. b0 a1 through ed e3)
|
| A0 | (ISO 2022) | Non-breaking space | ||
| A1 - FE | A1 - FE | JIS X0208-1990 | Standard punctuation, kana, and kanji | |
| FF | (ISO 2022) | Delete? Undefined? |