EUC-JP Table (non-kanji characters)


One day, I wanted to make a script to convert some text files containing romaji to EUC-JP, and I wasn't happy with the tools available (even though everyone and their brother has made such a script). Unfortunately, the only way I could find a "listing" of what values corresponded to what kana was to read someone's code, and finding a concise description of EUC-JP is nigh on impossible. Because of that, I made this table.


The EUC-JP characters below are two bytes long. The columns represent the first byte, and the rows the second byte. These characters are all non-kanji. Note that both bytes are between a1 and fe (inclusive). There are also some undefined characters within this range (shown in TH cells with a red background below). For more more information on EUC-JP, see below.


Note that column a1 contains punctuation, a4 hiragana, and a5 katakana. Equivalent kana have the same second byte. Except for some katakana only characters, the characters are grouped by the little version, the normal version, and then the ¡« and ¡¬ versions (as they exist). Also note that a1 a1 is defined; it's a space.


a1a2a3a4a5a6a7a8
a1¡¡¢¡ ¤¡¥¡¦¡§¡¨¡
a2¡¢¢¢ ¤¢¥¢¦¢§¢¨¢
a3¡£¢£ ¤£¥£¦£§£¨£
a4¡¤¢¤ ¤¤¥¤¦¤§¤¨¤
a5¡¥¢¥ ¤¥¥¥¦¥§¥¨¥
a6¡¦¢¦ ¤¦¥¦¦¦§¦¨¦
a7¡§¢§ ¤§¥§¦§§§¨§
a8¡¨¢¨ ¤¨¥¨¦¨§¨¨¨
a9¡©¢© ¤©¥©¦©§©¨©
aa¡ª¢ª ¤ª¥ª¦ª§ª¨ª
ab¡«¢« ¤«¥«¦«§«¨«
ac¡¬¢¬ ¤¬¥¬¦¬§¬¨¬
ad¡­¢­ ¤­¥­¦­§­¨­
ae¡®¢® ¤®¥®¦®§®¨®
af¡¯ ¤¯¥¯¦¯§¯¨¯
b0¡° £°¤°¥°¦°§°¨°
b1¡± £±¤±¥±¦±§±¨±
b2¡² £²¤²¥²¦²§²¨²
b3¡³ £³¤³¥³¦³§³¨³
b4¡´ £´¤´¥´¦´§´¨´
b5¡µ £µ¤µ¥µ¦µ§µ¨µ
b6¡¶ £¶¤¶¥¶¦¶§¶¨¶
b7¡· £·¤·¥·¦·§·¨·
b8¡¸ £¸¤¸¥¸¦¸§¸¨¸
b9¡¹ £¹¤¹¥¹ §¹¨¹
ba¡º¢º ¤º¥º §º¨º
bb¡»¢» ¤»¥» §»¨»
bc¡¼¢¼ ¤¼¥¼ §¼¨¼
bd¡½¢½ ¤½¥½ §½¨½
be¡¾¢¾ ¤¾¥¾ §¾¨¾
bf¡¿¢¿ ¤¿¥¿ §¿¨¿
c0¡À¢À ¤À¥À §À¨À
c1¡Á¢Á£Á¤Á¥Á¦Á§Á
c2¡Â £Â¤Â¥Â¦Â
c3¡Ã £Ã¤Ã¥Ã¦Ã
c4¡Ä £Ä¤Ä¥Ä¦Ä
c5¡Å £Å¤Å¥Å¦Å
c6¡Æ £Æ¤Æ¥Æ¦Æ
c7¡Ç £Ç¤Ç¥Ç¦Ç
c8¡È £È¤È¥È¦È
c9¡É £É¤É¥É¦É
ca¡Ê¢Ê£Ê¤Ê¥Ê¦Ê
cb¡Ë¢Ë£Ë¤Ë¥Ë¦Ë
cc¡Ì¢Ì£Ì¤Ì¥Ì¦Ì
cd¡Í¢Í£Í¤Í¥Í¦Í
ce¡Î¢Î£Î¤Î¥Î¦Î
cf¡Ï¢Ï£Ï¤Ï¥Ï¦Ï
d0¡Ð¢Ð£Ð¤Ð¥Ð¦Ð
d1¡Ñ £Ñ¤Ñ¥Ñ¦Ñ§Ñ
d2¡Ò £Ò¤Ò¥Ò¦Ò§Ò
d3¡Ó £Ó¤Ó¥Ó¦Ó§Ó
d4¡Ô £Ô¤Ô¥Ô¦Ô§Ô
d5¡Õ £Õ¤Õ¥Õ¦Õ§Õ
d6¡Ö £Ö¤Ö¥Ö¦Ö§Ö
d7¡× £×¤×¥×¦×§×
d8¡Ø £Ø¤Ø¥Ø¦Ø§Ø
d9¡Ù £Ù¤Ù¥Ù §Ù
da¡Ú £Ú¤Ú¥Ú §Ú
db¡Û ¤Û¥Û §Û
dc¡Ü¢Ü ¤Ü¥Ü §Ü
dd¡Ý¢Ý ¤Ý¥Ý §Ý
de¡Þ¢Þ ¤Þ¥Þ §Þ
df¡ß¢ß ¤ß¥ß §ß
e0¡à¢à ¤à¥à §à
e1¡á¢á£á¤á¥á §á
e2¡â¢â£â¤â¥â §â
e3¡ã¢ã£ã¤ã¥ã §ã
e4¡ä¢ä£ä¤ä¥ä §ä
e5¡å¢å£å¤å¥å §å
e6¡æ¢æ£æ¤æ¥æ §æ
e7¡ç¢ç£ç¤ç¥ç §ç
e8¡è¢è£è¤è¥è §è
e9¡é¢é£é¤é¥é §é
ea¡ê¢ê£ê¤ê¥ê §ê
eb¡ë £ë¤ë¥ë §ë
ec¡ì £ì¤ì¥ì §ì
ed¡í £í¤í¥í §í
ee¡î £î¤î¥î §î
ef¡ï £ï¤ï¥ï §ï
f0¡ð £ð¤ð¥ð §ð
f1¡ñ £ñ¤ñ¥ñ §ñ
f2¡ò¢ò£ò¤ò¥ò
f3¡ó¢ó£ó¤ó¥ó
f4¡ô¢ô£ô ¥ô
f5¡õ¢õ£õ ¥õ
f6¡ö¢ö£ö ¥ö
f7¡÷¢÷£÷
f8¡ø¢ø£ø
f9¡ù¢ù£ù
fa¡ú £ú
fb¡û
fc¡ü
fd¡ý
fe¡þ¢þ



The JP entities of EUC-JP are two bytes long. Both bytes must be greater than a0, but not ff. There are combinations within these ranges which are undefined (shown in TH cells with a red background above). Note that both bytes have the most significant bit (MSB) set; if a character does not have the MSB set, it is interpreted as a normal ASCII character.


After the range shown above, there is a long section of undefined characters, followed by kanji starting at b0 a1. There is a gap from cf d4 to cf fe. The last kanji is at f4 a6. f5 a1 through fe fe are intended for "user" defined characters.


I'm glazing over a lot of other details (e.g., "shifting" into extra character sets). But good luck finding coherant documentation. Here're a few places to start: ibiblio ISO 2022, ibiblio EUC, ibiblio JIS, HP. The 94x94 code they speak of is the same as the a1 through fe range; row 01 corresponds to column a1 above, and so on. A byte in this range is sometimes referred to as "GR", while bytes from 21 to 7e are called "GL". (These can mean different things in different character sets, but chances are, you're just trying to understand EUC-JP -- not every other ISO 2022 character set in existence).


A better summary can be found here, although note that it's a bit dated. Section 2.2 especially is useful for getting a grasp on EUC-JP. Note that code set 3 (aka G3) is no longer all user defined; it contains JIS X 0212-1990. From what I can tell, JIS X 0212-1990 contains mostly rare kanji. And by the way, no one will actually tell you what is in any of these character sets, presumedly because they are owned and sold by the Japanese Standards Association. I guess the idea is EUC-JP interpreters will just want to know how to take a set of bytes and figure out the character set and offset therein, and only font creators and the like will actually care what the resulting characters are.


Here's a summary table of EUC-JP. If a byte sequence isn't listed, it's undefined (or at least, some of the bytes are undefined). As is noted above, even these ranges have undefined characters in them as well.


First ByteSecond ByteThird ByteCode Set Description
00 - 7F US-ASCII ASCII (half-width) characters
8E A1 - DF JIS X0201-1976Half-width katakana, etc.
8F A1 - FE A1 - FE JIS X0210-1990Rare kanji (esp. b0 a1 through ed e3)
A0 (ISO 2022) Non-breaking space
A1 - FE A1 - FE JIS X0208-1990Standard punctuation, kana, and kanji
FF (ISO 2022) Delete? Undefined?