| original: | Using Unicode in Erlang |
|---|---|
| source: | STDLIB User's Guide Version 1.19.1 |
| url: | http://www.erlang.org/documentation/doc-5.10.1/lib/stdlib-1.19.1/doc/html/unicode_usage.html |
Unicode æåã»ãããµããŒãã®å®è£ ã¯ç¶ç¶äžã®ããã»ã¹ã§ãã Erlang æ¡åŒµææ¡ (EPP) 10 㯠Unicode ãµããŒãã®åºç€ãæŠèª¬ããŠãããå°æ¥çã«ãã¹ãŠã® Unicode å¯Ÿå¿ ã¢ãžã¥ãŒã«ãåãæ±ãã¹ããã€ããªã®ãããã©ã«ããšã³ã³ãŒãã£ã³ã°ãæå®ããŠããŸãã
EEP 10 ã§èª¬æãããŠããæ©èœã¯ãErlang/OTP ã® R13B ãšããŠå®è£
ãããŸããããããã¯
ãã®çµãããæå³ãããã®ã§ã¯ãããŸãããR14B01 ã§ã¯ Unicode ãã¡ã€ã«åã®
ãµããŒãã远å ãããŸããããå®å
šãªãã®ã§ã¯ãªãããã¡ã€ã«åã®ãšã³ã³ãŒãã£ã³ã°ã
ä¿èšŒãããŠããªããã©ãããã©ãŒã ã§ã¯ãããã©ã«ãã§ç¡å¹ã«ãããŠããŸããã
R16A ã§ã¯ UTF-8 ã§ãšã³ã³ãŒãããããœãŒã¹ã³ãŒãããµããŒãããããã€ãã®ç°å¢ã§ã¯
UTF-8 ãšã³ã³ãŒãããããã¡ã€ã«ã ãã§ãªã Unicode ãšã³ã³ãŒãããããã¡ã€ã«åã
ãµããŒããå€ãã®ã¢ããªã±ãŒã·ã§ã³ãæ©èœåŒ·åãããŸããã
æã泚ç®ã«å€ããã®ã¯ file:consulr/1 ã«ãããã¡ã€ã«èªã¿èŸŒã¿ã® UTF-8 ãµããŒãã
ãªãªãŒã¹ãã³ãã©ã® UTF-8 ãµããŒãããã㊠I/O ã·ã¹ãã ã§ã®æŽãªã Unicode æå
ã»ããã®ãµããŒãã§ãã
R17 ã§ã¯ Erlang ãœãŒã¹ã³ãŒãã®ããã©ã«ããšã³ã³ãŒãã£ã³ã°ã UTF-8 ã«å€æŽãã R18 ã§ã¯ãã¹ãŠã® Unicode ç¯å²ã§ã®ã¢ãã ããµããŒãããŸããããã¯ãã¹ãŠã® Unicode ã®é¢æ°åãã¢ãžã¥ãŒã«åãæå³ããŸãã
ãã®ã¬ã€ãã¯çŸåšã® Unicode ãµããŒãã«ã€ããŠæŠèª¬ããUnicode ããŒã¿ãæäœããããã® ã¬ã·ããããã€ã玹ä»ããŸãã
Erlang ã«ããã Unicode ãµããŒãã®çµéšã§ãUnicode æåãšãã®ãšã³ã³ãŒãã£ã³ã°ã çè§£ããããšã¯æåŸ ããã»ã©ç°¡åã§ã¯ãªãããšãçãã»ã©æç¢ºã«ãªããŸããã ãã£ãŒã«ãã®è€éãã ãã§ã¯ãªããæšæºã®æå³ãæ³å以äžã«ååãªæŠå¿µã®çè§£ãå¿ èŠãš ããŸããã
ããã« Erlang ã®å®è£ ã¯ãå€ãã® (Erlang) ããã°ã©ãã«ãšã£ãŠåé¡ã§ãã£ãããšã ãªãæŠå¿µãçè§£ããå¿ èŠããããŸãã Unicode æåãçè§£ã䜿ãããã«ã¯ãããšã çµéšè±å¯ãªããã°ã©ãã§ãã£ãŠãããã®ããŒãã培åºçã«å匷ããããšãå¿ èŠã§ãã
äžäŸãšããŠã倧æåãå°æåã«å€æããåé¡ãèããããšãã§ããŸãã
èŠæ Œãèªãã°ããã¹ãŠã®åçã«ããããåãªãïŒå¯ŸïŒã®ãããã³ã°ã¯ç¡ããšããããšã«
æ°ä»ããããã§ããããäžäŸãšããŠãã€ãèªãåããšãå°æåã® "Ã" (ã·ã£ãŒã s) ã¯
倧æåã® "SS" ã察å¿ããŸãããŸãã®ãªã·ã£èªã®å Žåã "Σ" ã¯ïŒã€ã®ç°ãªãå°æåã®
圢åŒããããŸã: åèªã®æ«å°Ÿäœçœ®ã§ã¯ "Ï" ãä»ã®äœçœ®ã§ã¯ "Ï" ã§ãããŸãããã«ã³èª
ã§ã¯ ãããä»ããšãããç¡ãã® "i" ã®åœ¢åŒã倧æåã«ãå°æåã«ãååšããŸããã
ããªã«æåã® "l" ã¯éåžžå°æåã®åœ¢åŒããããŸããããã¡ããã倧æåïŒãŸãã¯å°æåïŒ
ã®æŠå¿µãæããªãèšèªããããŸãããã®ãã倿æ©èœã¯äžåºŠã«äžæåã ãã§ã¯ãªãã
æç« å
šäœãç¥ã£ãŠããå¿
èŠããããŸãããæããèªç¶èšèªã®ç¿»èš³ã§ã¯å
¥åãšåºåã®
æååé·ãªã©ã®éããèæ
®ããã¹ãã§ããããåçš¿å·çæç¹ã§ Erlang/OTP ã«ã¯
Unicode ã® to_upper/to_lower æ©èœã¯ãããŸãããããããã®åé¡ã«å¯ŸåŠããããã®
å
¬åŒã®ã©ã€ãã©ãªããããŸãã
ããäžã€ã¯ãåãã°ãªããïŒã€ã®ç°ãªã衚çŸãæã€ã¢ã¯ã»ã³ãèšå·ä»ãæåã®äŸã§ãã ã¹ãŠã§ãŒãã³èªã® "ö" ãèŠãŠã¿ãŸãããã Unicode æšæºã«ã¯ãã®ããã®ã³ãŒããã€ã³ãã ãããŸããã "o" ã«ç¶ã㊠U+0308 (åæçšãŠã ã©ãŠããç°¡åã«èšããšæåŸã®æåã®äžã« "š" ãä»ããŸã) ãç¶ããŠæžãããšãã§ããŸãããããã¯å šãåãã°ãªãã§ãã ãããã®ç®çã¯ã»ãŒåãã§ãããå šãç°ãªãè¡šçŸæ¹æ³ããŠããŸãã äŸãã°ã Mac OS X ã¯ãã¹ãŠã®ãã¡ã€ã«åã«åæçšãŠã ã©ãŠãã䜿ããŸãããäžæ¹ã§ä»ã® ã»ãšãã©ã®ããã°ã©ã (Erlang ãå«ã) ã¯ããã£ã¬ã¯ããªäžèЧã®éãªã©ã«éã®äºãã㊠é ãããšããŸãããã®ãããªããšãè¡ãããŠã¯ããŸãããéåžžãã²ã©ãæ··ä¹±ãé¿ããããã«ã ãã®ãããªæåã¯æ£èŠåããããšãéèŠã§ãã
äŸãšããŠäžããäžèЧã¯ãUnicode æšæºã«åŸãã°å¯èœã ããšã¯æããŸãã ãã€ã³ãã¯ãããã°ã©ã ãïŒã€ãïŒã€ãããã®èšèªã ãã察象ãšããå Žåã«ã¯èæ ®ãã å¿ èŠããªãã£ããããªç¥èãå¿ èŠã ãšããããšã§ãã åœéçãªæšæºã®æ§ç¯ã«ææŠãããšãã¯ã人éã®èšèãšæç« ã®è€éãã«å¯ŸããŠã確ãã« ãããã®èæ ®ãå¿ èŠã§ãããããªãã®ããã°ã©ã ã§ Unicode ãé©åã«ãµããŒãããã«ã¯ã ããããåŽåãå¿ èŠãšããã§ãããã
Unicode ãšã¯ããã¹ãŠã®æ¢ç¥ã®ãçŸåšäœ¿ãããŠããããŸãã¯äœ¿ãããŠããªãæåã® ã³ãŒããã€ã³ã (æ°å€) ãå®çŸ©ããèŠæ Œã§ããåççã«ã¯ãä»»æã®èšèªã§äœ¿ããããã¹ãŠ ã®æ¢ç¥ã®ã·ã³ãã«ã¯ã Unicode ã³ãŒããã€ã³ããæã£ãŠããŸãã
Unicode ã³ãŒããã€ã³ãã¯ãéå¶å©å£äœã§ãã Unicode ã³ã³ãœãŒã·ã¢ã ã«ãã£ãŠ çå®ãå ¬éãããŠããŸãã
ããã°ã©ã ãã°ããŒãã«ãªç°å¢ã§äœ¿çšãããå Žåã«ãå¯äžå ±éã®æåã»ããããããã ã¡ãªããã¯å§åçãªãã®ãšããŠãã³ã³ãã¥ãŒãã£ã³ã°ã®äžçå šäœã§ã¯ Unicode ã® ãµããŒããå¢å ããŠããŸãã
èŠæ Œã®åºæ¬: ãã¹ãŠã®æåã®ã³ãŒããã€ã³ãã«å ãã䜿çšå¯èœãª ãšã³ã³ãŒãã£ã³ã° èŠæ Œ ãè€æ°ãããŸãã
ããã§ããšã³ã³ãŒãã£ã³ã°ãš Unicode æåã®éããçè§£ããããšãäžå¯æ¬ ã§ãã Unicode æå㯠Unicode èŠæ Œã«åºã¥ããã³ãŒããã€ã³ãã§ããããšã³ã³ãŒãã£ã³ã°ã¯ ã³ãŒããã€ã³ãã衚çŸããããã®æ¹æ³ã§ãããšã³ã³ãŒãã£ã³ã°ã¯åã«è¡šçŸã®ããã®èŠæ Œã§ã äŸãã° UTF-8 ã¯éåžžã«éå®ããã Unicode æåã»ãã (äŸãã° ISO-Latin-1)ãã Unicode ã®ãã¹ãŠã®ç¯å²ã®æåã衚çŸããããã«äœ¿çšããããšãã§ããŸãã ããã¯åã«ãšã³ã³ãŒãã£ã³ã°åœ¢åŒãªã®ã§ãã
ãã¹ãŠã®æåã»ããã 256 æåã«éãããéããããããã®æåã¯åäžã®ãã€ãã«æ ŒçŽ ã§ããããããããã®æåã«å¯ŸããŠå€ããå°ãªããïŒã€ã¯å®çšçãªãšã³ã³ãŒãã£ã³ã°ã ãããŸãããããããã®æåãïŒã€ã®ãã€ãã§ãšã³ã³ãŒãããæ¹æ³ã¯éåžžã«äžè¬çãã㊠ååãããããŸãããä»ãç§ãã¡ã¯ Unicode ã·ã¹ãã ã«ãã£ãŠã256 ãè¶ ããæåã æã£ãŠããããããã衚çŸããäžè¬çãªæ¹æ³ãå¿ èŠãšããŠããŸãã ã³ãŒããã€ã³ãã衚ãäžè¬çãªæ¹æ³ãããããšã³ã³ãŒãã£ã³ã°ã§ããããã¯æå衚çŸã äºç¹ã®ãªã以åã®ããã°ã©ãã«ãšã£ãŠãå šãæ°ããæå衚çŸã®æŠå¿µãæå³ããŸãã
ç°ãªããªãã¬ãŒãã£ã³ã°ã»ã·ã¹ãã ãããŒã«ã§ã¯ç°ãªããšã³ã³ãŒãã£ã³ã°ããµããŒãã㊠ããŸããããšãã°ã Linux ãš MacOS X 㯠7Bit ASCII ãšäžäœäºææ§ããã UTF-8 ãšã³ã³ãŒãã£ã³ã°ãæ¡çšããŠããã®ã§ãè±èªã§æžãããããã°ã©ã ã«å¯Ÿãã圱é¿ã¯æå°é ã§ããäžæ¹ãWindows 㯠UTF-16 ã®éå®ããŒãžã§ã³ãããªãã¡ã16-bit ãšã³ãã£ãã£ã§ æ ŒçŽã§ããæåã®ãã¹ãŠã®ã³ãŒãé¢ããµããŒãããŠãããçŸåšäœ¿ãããŠããã»ãšãã©ã® èšèªãå«ãã§ããŸãã
æãåºãæ®åããŠãããšã³ã³ãŒãã£ã³ã°ã¯:
ãã€ãåäœè¡šçŸ
ããã¯é©å㪠Unicode 衚çŸã§ã¯ãªããUnicode èŠæ Œä»¥åã«æåã衚çŸãããã㫠䜿ãããŠããŸãããããã¯ãŸã Unicode èŠæ Œã§ 256 以äžã®ã³ãŒããã€ã³ããæã€ æåã®è¡šçŸã«äœ¿ãããšãã§ãããã㯠ISO-Latin-1 æåã»ãããšæ£ç¢ºã«å¯Ÿå¿ããŸãã Erlang ã§ã¯ãISO-Latin-1 ããšã³ã³ãŒãã£ã³ã°ã§ã¯ãªããæåã³ãŒãç¯å²ã§ãã ãã®ããã§çŽããããã§ãããããã¯äžè¬ã« Latin 1 ãšã³ã³ãŒãã£ã³ã°ãæå³ããŸãã
UTF-8
åæåã¯ãã³ãŒããã€ã³ãã«å¿ããŠïŒãïŒãã€ãã«æ ŒçŽãããŸãã UTF-8 ã§ïŒãã€ãã§è¡šçŸå¯èœãªãã¹ãŠã® 7-bit æåã¯ã7-bit ASCII æåã®ãã€ã åäœè¡šçŸãšäžäœäºææ§ããããŸããã³ãŒããã€ã³ã 127 ãè¶ ããæåã¯ããå€ãã® ãã€ãã«æ ŒçŽãããæåã®æåã®æäžäœãããã¯ãã«ããã€ãæåã衚ããŸãã ãšã³ã³ãŒãã®è©³çްã«ã€ããŠã¯ RFC ãå ¬éãããŠããŸãã UTF-8 ã¯ã³ãŒããã€ã³ã 128 ãã 255 ã®ãã€ãåäœè¡šçŸãšã¯äºææ§ããªãã®ã§ã ISO-Latin-1 ã®ãã€ãåäœè¡šçŸãšã¯äºææ§ã ãªã ããšã«æ³šæããŠãã ããã
UTF-16
ãã®ãšã³ã³ãŒãã£ã³ã°ã¯ UTF-8 ã«å¯ŸããŠå€ãã®é¡äŒŒç¹ãæã£ãŠããŸãããåºæ¬çãª
åäœã¯ 16-bit ã®æ°å€ã§ããããã¯å°ãªããšããã¹ãŠã®æåãïŒãã€ããããã€ãã®
é«ãæ°åã¯ïŒãã€ããå æããããšãæå³ããŸãã
äžéšã®ããã°ã©ã ãã©ã€ãã©ãªããã³ãªãã¬ãŒãã£ã³ã°ã»ã·ã¹ãã ã§ã¯ 16-bit ã®
ãšã³ãã£ãã£äžã€ã«æ ŒçŽåºæ¥ãç¯å²ã®æåã®ã¿äœ¿ãããšãèš±å¯ãããšäž»åŒµããŠ
ããŸãããããã¯éåžžãçŸåšäœ¿ãããŠããèšèªãæ±ãã«ã¯å
åã§ãã
åºæ¬ã®åäœãè€æ°ãã€ãã®ãããUTF-16 ã«ã¯ããã°ãšã³ãã£ã¢ã³ãšãªãã«ãšã³
ãã£ã¢ã³ã®åæ¹ã®åœ¢åŒãååšããããšããããã€ããªãŒããŒåé¡ãçºçããŸãã
Erlang ã®å Žå unicode ã¢ãžã¥ãŒã«ã bit æ§æã®äžã§ã¯ãUTF-16 ã®ãã¹ãŠã®
ç¯å²ã«å¯Ÿå¿ããŠããŸãã
UTF-32
æãçŽæ¥çãªè¡šçŸæ¹æ³ã§ããããããã®æåãïŒã€ã® 32-bit ã®æ°å€ã§æ ŒçŽãããŸãã äžã€ã®æåã®ããã«ãšã¹ã±ãŒããè€æ°ã®å¯å€éã®ãšã³ãã£ãã£ã¯äžèŠã§ããã¹ãŠã® Unicode ã³ãŒããã€ã³ããïŒã€ã® 32-bit ãšã³ãã£ãã£ã«æ ŒçŽã§ããŸãã UTF-16 ã®å Žåãšåæ§ã«ãUTF-32 ã«ã ããã°ãšã³ãã£ã¢ã³ãšãªãã«ãšã³ãã£ã¢ã³ã® äž¡æ¹ãååšãããã€ããªãŒããŒã®åé¡ããããŸãã
UCS-4
åºæ¬çã«ã¯ UTF-32 ãšåãã§ãããããã€ã Unicode ã®ã»ãã³ãã£ã¯ã¹ããªãã IEEE ã«ãã£ãŠå®çŸ©ããããããç°ãªããšã³ã³ãŒãã£ã³ã°èŠæ ŒãšããŠäœ¿ãããŠããŸãã ãã¹ãŠã®æ®éã® (ãããŠããããæ®éã§ãªã) çšéã«ãããŠãUTF-32 ãš UCS-4 ã¯ äºææ§ããããŸãã
ããç¹å®ç¯å²ã®æ°åãUnicode èŠæ Œã®äžã§ãæªäœ¿çšãšãããŠãããããç¯å²ã¯ç¡å¹ãš èŠåãããŠããŸããç¡å¹ãªç¯å²ã§æã泚ç®ãã¹ããªã®ã¯ 16#D800 - 16#DFFF ã§ã UTF-16 ãšã³ã³ãŒãã£ã³ã°ã§ã¯ãããã®æ°å€ã®ãšã³ã³ãŒãã£ã³ã°ãèš±å¯ããŠããŸããã ããã¯ãåœå UTF-16 ãšã³ã³ãŒãã£ã³ã°èŠæ ŒãïŒã€ã® 16-bit ãšã³ãã£ãã£ã§ãã¹ãŠã® Unicode æåãä¿æã§ãããšæåŸ ããŠãããã®ã®ãäžäœäºææ§ã«å¯Ÿå¿ããããã«ã Unicode ã®ç¯å²ã«ç©Žãæ®ããŠæ¡åŒµãããåŸãªãã£ããããšæšæž¬ã§ããŸãã
ãŸããã³ãŒããã€ã³ã 16#FEFF 㯠(BOMã®) ãã€ããªãŒããŒããŒã¯ã«äœ¿ããããã以å€ã® æèã§äœ¿çšããããšã¯æšå¥šãããŠããŸãããå®éã®ãšããããã㯠"ZWNBS" (Zero Width Non Breaking Space) ãšããŠæå¹ã§ããBOM ã¯ãšã³ã³ãŒãã£ã³ã°ããã€ããªãŒããŒã® ãããªãããã°ã©ã ãäºåã«åãããªããã©ã¡ãŒã¿ãèå¥ããããã«äœ¿çšãããŸãã ãã€ããªãŒããŒããŒã¯ã¯ãæåŸ ãããããã»ãšãã©ãã£ãã«äœ¿ãããŸããããããã°ã©ã ã ãããã¡ã€ã«ã® Unicode 圢åŒã«ã€ããŠè³¢ãæšæž¬ãã§ããæ¹æ³ãæäŸãããã«ã€ããŠã ããåºãæ®åããããç¥ããŸããã
Erlang ã§ Unicode ããµããŒãããããã«ãããã€ãã®åéã§ã®åé¡ã察åŠãããŠããŸãã ååéã«ã€ããŠã¯ããã®ã»ã¯ã·ã§ã³ã§ã¯ç°¡åã«è§Šãããã®ããã¥ã¡ã³ãã®äžèšã§ãã詳现㫠説æããŸãã
衚çŸ
Erlang ã§ Unicode æåãåŠçããããã«ããªã¹ããšãã€ããªã®äž¡æ¹ã§å ±éã®è¡šçŸã æã€å¿ èŠããããŸããEEP (10) ãš R13A 以éã®æåã®å®è£ ã§ã¯ãErlang ã®ã§ Unicode æåã®æšæºçãªè¡šçŸã解決ããŸããã
æäœ
ã©ã€ãã©ãªé¢æ°ã Unicode æåãåŠçã§ããããã«ããå¿ èŠãããã®ã§ãUnicode æå㯠Erlang ããã°ã©ã ã«ãã£ãŠåŠçãããããã«ãªãå¿ èŠããããŸãã ããã€ãã®ã±ãŒã¹ã§ã¯ãæ©èœã¯æ¢åã®ã€ã³ã¿ãŒãã§ãŒã¹ (äŸãã°stringã¢ãžã¥ãŒã«ã¯ãçŸåšä»»æã®ã³ãŒããã€ã³ãã§ãªã¹ããæ±ãããšãåºæ¥ãŸã) 㫠远å ããããŸãããã±ãŒã¹ã§ã¯æ°ããæ©èœããªãã·ã§ã³ (ioã¢ãžã¥ãŒã«ã ãã¡ã€ã«ãã³ããªã³ã°ãunicodeã¢ãžã¥ãŒã«ãã㊠bit æ§æãªã©) ã 远å ãããŸããããããŠçŸåšãkernel ãš STDLIB ã®ã»ãšãã©ã®ã¢ãžã¥ãŒã«ã VM ãšåæ§ã« Unicode ã«å¯Ÿå¿ããŠããŸãã
ãã¡ã€ã« I/O
I/O ã¯æç¶ãæã Unicode åé¡ã®è§£æ±ºãé£ããåéã§ãããã¡ã€ã«ã¯ãã€ããæ ŒçŽ ããããŸãããã°ã©ãã³ã°ã®ç¥èãæåãšãã€ãã«äº€æå¯èœãªãã®ãšããŠæ±ããã ãšã³ãã£ãã£ã§ããUnicode æåã䜿ããšããã¡ã€ã«ã«ããŒã¿ãæ ŒçŽãããšããã« ãšã³ã³ãŒãã£ã³ã°ã決å®ããå¿ èŠããããŸããErlang ã§ã¯ãšã³ã³ãŒãã£ã³ã° ãªãã·ã§ã³ãã€ããŠãã¡ã€ã«ãéããã®ã§ããã¡ã€ã«ãããã€ãåäœã§ã¯ãªãã æåãèªã¿åãããšãã§ããŸããããã€ãåäœã® I/O çšã«ãã¡ã€ã«ãéãããšã ã§ããŸããErlang ã® I/O ã·ã¹ãã ã¯ä»»æã® I/O ãµãŒããä»»æã®æååããŒã¿ã« 察åŠã§ããäºãæåŸ ããããã«èšèšãã㊠(ãããã¯å°ãªããšã䜿ãããŠ) ããŸãããUnicode æåãåŠçããå Žåããã¯ãã¯ãåé¡ã§ã¯ãããŸããã çµå±ããŒã¿ã®ããããã€ã¹ã®æ©èœãç¥ãå¿ èŠããããšããäºå®ãåãæ±ãããšã¯ã Erlang ããã°ã©ãã«ãšã£ãŠæ°ãããã®ã§ããããã«ãErlang ã®ããŒãã¯ãã€ã æåã§ãããããããŒãã«å¯ŸããŠä»»æã® (Unicode) æåã®æååãåãã«éžæãã ãšã³ã³ãŒãã£ã³ã°ã«å€æããããšãªãéä¿¡ããããšã¯åºæ¥ãŸããã
ã¿ãŒããã« I/O
ã¿ãŒããã« I/O ã¯ãã¡ã€ã« I/O ãããè¥å¹²ç°¡åã§ããåºåã¯äººéãèªãããšã æå³ããŠãããé垞㯠Erlang ã®æ§æã§ã (äŸ: ã·ã§ã«) ã å®éã«ã¯ã°ãªãã衚瀺ããªããä»»æã® Unicode æåã®æ§æçãªè¡šçŸ (代ããã« \x{HHH} ãšèšè¿°ããŸã) ãååšãããããã¿ãŒããã«ã Unicode ç¯å²å šäœã ãµããŒãããŠããŠãããªããŠããé垞㯠Unicode ããŒã¿ã衚瀺ã§ããŸãã
ãã¡ã€ã«å
ãã¡ã€ã«åã¯ãåºæ¿ãšãªãOSããã¡ã€ã«ã·ã¹ãã ã«å¿ããŠãç°ãªãæ¹æ³ã§ Unicode
æååãšããŠæ ŒçŽã§ããŸããããã¯ããã°ã©ã ã«ãã£ãŠããªãç°¡åã«æ±ãããšã
ã§ããŸããåé¡ã¯ãäŸãã° Linux ã®ããã«ããã¡ã€ã«ã·ã¹ãã ã®ãšã³ã³ãŒãã£ã³ã°
ãäžèŽããŠããªãå Žåã«çºçããŸããLinux ã¯ãã¡ã€ã«ã«ä»»æã®ãã€ãã·ãŒã±ã³ã¹ã§
åœåããããšãã§ããåããã°ã©ã ããããã®ãã€ããè§£éã§ããããã«æ®ããŸãã
ãããã® "éæãª" ãã¡ã€ã«åã䜿ãããŠããã·ã¹ãã ã§ã¯ãErlang ã¯èµ·åãã©ã°
ã«ãã£ãŠãã¡ã€ã«åã®ãšã³ã³ãŒãã«ã€ããŠç¥ãããŠãããå¿
èŠããããŸãã
ããã©ã«ãã§ã¯ãã€ãåäœã®è§£éã§ãããã¯æ®éãå®éã«ã¯èª€ãã§ããããã¹ãŠã®
ãã¡ã€ã«åã®è§£éãå¯èœã«ãªããŸãã"çã®ãã¡ã€ã«å" ã®èãæ¹ã¯ãããã
ããã©ã«ãã§ã¯ãªããã©ãããã©ãŒã ã§ Unicode ãã¡ã€ã«å倿 (+fnu) ã
æå¹ã«ããå Žåã«ãééã£ãŠãšã³ã³ãŒãããããã¡ã€ã«åãåŠçããããã«
䜿ããŸãã
ãœãŒã¹ã³ãŒã ãšã³ã³ãŒãã£ã³ã°
Erlang ãœãŒã¹ã³ãŒãã§ããã UTF-8 ãšã³ã³ãŒãã£ã³ã°ãšãã€ãåäœã® ãšã³ã³ãŒãã£ã³ã°ã®ãµããŒãããããŸããR16B ã§ã®ããã©ã«ãã¯ãã€ãåäœ (ãŸã㯠latin1) ãšã³ã³ãŒãã£ã³ã°ã§ãããã¡ã€ã«ã®å é ã®æ¬¡ã®ãããªã³ã¡ã³ãã§ ãšã³ã³ãŒãã£ã³ã°ãå¶åŸ¡ã§ããŸã:
%% -*- coding: utf-8 -*-ãã¡ãããããã¯ããªãã®ãšãã£ã¿ãåæ§ã« UTF-8 ããµããŒãããå¿ èŠããããŸãã åæ§ã®ã³ã¡ã³ãã¯
file:consult/1ããªãªãŒã¹ãã³ãã©çã®ãããªé¢æ°ã«ã è§£éã§ããã®ã§ããœãŒã¹ãã£ã¬ã¯ããªã®ãã¹ãŠã®ããã¹ããã¡ã€ã«ã UTF-8 ã§ æã€ããšãã§ããŸãã
èšèª
UTF-8 ã®ãœãŒã¹ã³ãŒããæã€ããšã§ã255 以äžã®ã³ãŒããã€ã³ããã〠Unicode
æåãå«ãæååãªãã©ã«ãèšè¿°ããããšãå¯èœã§ãããã¢ãã ãã¢ãžã¥ãŒã«åã
颿°å㯠R18 ããªãªãŒã¹ããããŸã§ã¯ ISO-Latin-1 ã®ç¯å²ã«å¶éãããŸãã
/utf8 ã䜿ã£ããã€ããªãªãã©ã«ãã255 以äžã® Unicode æåã䜿ã£ãŠè¡šçŸ
ããããšãã§ããŸãã7-bit ASCII 以å€ã®æåã䜿çšããã¢ãžã¥ãŒã«åã«ãããš
ãªãã¬ãŒãã£ã³ã°ã»ã·ã¹ãã äžã§ãã¡ã€ã«åœåã¹ããŒã ã®åé¡ãåŒãèµ·ããå¯èœæ§
ãããããŸãç§»æ€æ§ãæãªã£ãŠããŸããããããªãã®ã§ãæ¬åœã«ããããããŸããã
EEP 40 ã§ã¯ãèšèªãšããŠã倿°åã« 255 以äžã® Unicode æåã䜿ããããã«
ãã¹ããšææ¡ãããŠããŸãããã® EEP ãå®è£
ãããåŠãã«ã€ããŠã¯ããŸã 決å®ããŠ
ããŸããã
Elang ã§ã¯ãæååã¯å®éã«ã¯æŽæ°ã®ãªã¹ãã§ããæåå㯠R13 ãŸã§ã¯ ISO-Latin-1 (ISO8859-1) ã§ãšã³ã³ãŒããããããã«å®çŸ©ãããŠããŸããã ISO-Latin-1 㯠Unicode æåã»ããã®éšåã³ãŒãç¯å²ã§ãã³ãŒããã€ã³ãåäœã«ïŒå¯ŸïŒã§å¯Ÿå¿ããŠããŸãã
ãã®ãããæååã®ããã®æšæºã®ãªã¹ããšã³ã³ãŒãã£ã³ã°ã¯ãUnicode ç¯å²å šäœã«å¯Ÿå¿ ãããã容æã«æ¡åŒµãããŸãã: Erlang ã«ããã Unicode æååã¯æŽæ°ãå«ãã·ã³ãã«ãª ãªã¹ãã§ãåæŽæ°ã¯æå¹ãª Unicode ã³ãŒããã€ã³ãã§ãããUnicode æåã»ããå ã® äžæåã衚çŸããŠããŸãã
ISO-Latin-1 ãšã³ã³ãŒãã£ã³ã°ã® Erlang æååã¯ãUnicode æååã®ãµãã»ããã§ãã
æååã« 256 æªæºã®ã³ãŒããã€ã³ãã®ã¿ãå«ãŸããŠããå Žåã®ã¿ãäŸãã°
erlang:iolist_to_binary/1 ãªã©ã䜿ã£ãŠçŽæ¥ãã€ããªã«å€æããããçŽæ¥ããŒãã«
éä¿¡ããããšãã§ããŸããæååã« 255 以äžã® Unicode æåãå«ãŸããŠããå Žåã¯ã
ãšã³ã³ãŒãã£ã³ã°ã決å®ããªããã°ãªãããæååã¯
unicode:characters_to_biary/{1,2,3} ã䜿ã£ãŠæãŸãããšã³ã³ãŒãã§ãã€ããªã«
倿ããå¿
èŠããããŸããæåå㯠R13 以åã«ããã§ãã£ãããã«ãéåžžã¯ãã€ãã®
ãªã¹ãã§ã¯ãããŸããããããã¯æåã®ãªã¹ãã§ããæåã¯éåžžããã€ãã§ã¯ãªã
Unicode ã³ãŒããã€ã³ãã§ãã
ãã€ããªã¯ããã£ãšåä»ã§ããããã©ãŒãã³ã¹äžã®çç±ãããããã°ã©ã ã¯ãã°ãã°ãã
ã¹ãããŒã¿ããªã¹ãã§ã¯ãªãããã€ããªã§ä¿æããŸãããäž»ãªçç±ã¯ããã³ã³ãã¯ã
(ãªã¹ãã®å ŽåãïŒæåãããïŒãã€ãã§ã¯ãªããæåããšã«ïŒã¯ãŒã) ã ããã§ãã
erlang:list_to_bianry/1 ã䜿ããšãïŒæåãããïŒãã€ãã®ãã€ãåäœã®
ãšã³ã³ãŒãã£ã³ã°ã䜿ã£ãŠãå¹ççã« ISO-Latin-1 ã® Erlang æååããã€ããªã«å€æ
ã§ããŸãããã®æ¹æ³ã¯ãããã®éããã Erlang æååã«ã¯éåžžã«äŸ¿å©ã§ãããä»»æã®
Unicode ãªã¹ãã«å¯ŸããŠå®è¡ã§ããŸããã
UTF-8 ãåºãæ®åãã7-bit ASCII ç¯å²å ã«å¯Ÿããäžäœäºææ§ãæäŸããããã«ãªãã« ããšã§ãErlang ã®ãã€ããªå ã«ããã Unicode æåã®æšæºãšã³ã³ãŒãã£ã³ã°ãšã㊠éžã°ããŸããã
æšæºãã€ããªãšã³ã³ãŒãã£ã³ã°ã¯ãErlang ã®ã©ã€ãã©ãªé¢æ°ããã€ããªå ã® Unicode ããŒã¿ã«å¯ŸåŠãããã³ã«äœ¿ãããŸãããå€éšãšã®éä¿¡æã¯ãã¡ãã匷å¶ã§ã¯ãããŸããã 颿°ããããæ§æã¯ãã€ããªå ã® UTF-8ãUTF-16 ããã³ UTF-32 ããšã³ã³ãŒãããã³ ãã³ãŒãããããã«ååšããŸããäžè¬çã«ãã©ã€ãã©ãªé¢æ°ã¯ãã€ããªãš Unicode 㫠察å¿ããŸãããæšæºãšã³ã³ãŒãã£ã³ã°ã«ã®ã¿å¯Ÿå¿ããŠããŸãã
æåããŒã¿ã¯ããã€ãã®ãœãŒã¹ããçµã¿åãããããšãã§ããæã«ã¯æååãšãã€ããªã®
æ··åãå¯èœã§ããErlang ã¯é·ãã iodata ãš iolists ãšããæŠå¿µããã£ãŠããŠ
ãã€ããªãšãªã¹ãã¯ãã€ãã·ãŒã±ã³ã¹ã衚ãããã«çµã¿åãããããšãã§ããŸãã
åæ§ã«ãUnicode 察å¿ã¢ãžã¥ãŒã«ã¯ããã°ãã° UTF-8 ã§ãšã³ã³ãŒããããæåãå«ã
ãã€ããªãããã®ãããªãã€ããªã Unicode ã³ãŒããã€ã³ããè¡šãæ°å€ãå«ããªã¹ã
ãªã©ã®ããã€ããªãšãªã¹ãã®çµã¿åãããå¯èœã§ã:
unicode_binary() = binary() with characters encoded in UTF-8 coding standard
chardata() = charlist() | unicode_binary()
charlist() = maybe_improper_list(char() | unicode_binary() | charlist(),
unicode_binary() | nil())
STDLIB ã® unicode ã¢ãžã¥ãŒã«ãåããããªãUTF-8 以å€ã®ãšã³ã³ãŒãã£ã³ã°ãå«ã
ãã€ããªã®æ··åšããµããŒãããŠããŸãããããã¯å€éšãããšå€éšãžã®ããŒã¿å€æãå¯èœã«
ããããã®ç¹æ®ãªã±ãŒã¹ã§ã:
external_unicode_binary() = binary() with characters coded in
a user specified Unicode encoding other than UTF-8 (UTF-16 or UTF-32)
external_chardata() = external_charlist() | external_unicode_binary()
external_charlist() = maybe_improper_list(char() |
external_unicode_binary() |
external_charlist(),
external_unicode_binary() | nil())
Erlang/OTP R16 ã®æç¹ã§ãErlang ã®ãœãŒã¹ã³ãŒããã¡ã€ã«ã¯ UTF-8 ãšãã€ãåäœã® ãšã³ã³ãŒãã£ã³ã° (å¥å Latin1 ãšã³ã³ãŒãã£ã³ã°) ã®ã©ã¡ãã§ãèšè¿°ã§ããŸãã Erlang ã®ãœãŒã¹ãã¡ã€ã«ã®ãšã³ã³ãŒãã£ã³ã°ãæç€ºããæ¹æ³ã®è©³çްã«ã€ããŠã¯ã epp(3) ã«èšèŒãããŠããŸããæååãã³ã¡ã³ã㯠Unicode ã§æžããŸããã颿°ã¯ ãŸã ISO-latin-1 æåã»ããã®æåã䜿ã£ãååã§ãªããã°ãããŸããããatom 㯠åãã ISO-latin-1 ã®ç¯å²ã«å¶éãããŠããŸããèšèªã«ããããããã®å¶éã¯ã ãã¡ãããœãŒã¹ã³ãŒãã®ãšã³ã³ãŒãã£ã³ã°ã«ã¯äŸåããŸããã Erlang/OTP R18 ã§ã¯ Unicode ã®ååã®é¢æ°ã Unicode ã® atom ãæ±ãããã«ãªããš æãããŸãã
ãããæ§æã«ã¯ã3ã€ã®äž»èŠãªãšã³ã³ãŒãã£ã³ã°ã®ãã€ããªããŒã¿ã«å¯ŸåŠããããã®
åãå«ãŸããŠããŸããåã«ã¯ãããã utf8, utf16 ãš utf32 ãšãã
ååãä»ããŠããŸãã utf16 ãš utf32 ã®åã¯ããŸãããã°ãšã³ãã£ã¢ã³ãš
ãªãã«ãšã³ãã£ã¢ã³ã®åœ¢åŒããããŸã
<<Ch/utf8,_/binary>> = Bin1, <<Ch/utf16-little,_/binary>> = Bin2, Bin3 = <<$H/utf32-little, $e/utf32-little, $l/utf32-little, $l/utf32-little, $o/utf32-little>>,
䟿å®äžããªãã©ã«æååã¯æ¬¡ã® (ãŸãã¯é¡äŒŒã®) æ§æã䜿ã£ãŠ Unicode ãšã³ã³ãŒãã£ã³ã° ã®ãã€ããªã«ãšã³ã³ãŒãããããšãã§ããŸã
Bin4 = <<"Hello"/utf16>>,
ãœãŒã¹ã³ãŒãã«ã€ããŠã¯ã \OOO (ããã¯ã¹ã©ãã·ã¥ã®åŸã«3æ¡ã®8鲿°ãç¶ã) ãš
\xHH (ããã¯ã¹ã©ãã·ã¥ã®åŸã« x ãç¶ããããã«2æ¡ã®16鲿åãç¶ã) ãããªãã¡
\x{H ...} (ããã¯ã¹ã©ãã·ã¥ã®åŸã« x ãç¶ããŠå·Šæ³¢æ¬åŒ§ãä»»æã®16鲿°ã峿³¢æ¬åŒ§ã
ç¶ã) ã®æ¡åŒµè¡šèšããããŸãã
ããã«ããããœãŒã¹ãã¡ã€ã«ã®ãšã³ã³ãŒãã£ã³ã°ããã€ãåäœ (Latin-1) ã®å Žåã§ãã
æååå
ã®æåã®éãã«ãä»»æã®ã³ãŒããã€ã³ãã®æåãå
¥åã§ããŸãã
ã·ã§ã«ã®å Žåã Unicode ã®å
¥åããã€ã¹ã䜿ã£ãŠãããããœãŒã¹ãã¡ã€ã«ã UTF-8 ã§
ä¿åãããŠãããªãã $ 㯠Unicode æåãæŽæ°å€ã§æäŸããããšã«ãã£ãŠãçŽæ¥
远åŸããããšãã§ããŸãã
次ã®äŸã§ã¯ãåºåã¯ããªã«æåã® ïœ ã®ã³ãŒããã€ã³ãã§ã
7> $Ñ. 1089
ç¹å®ã®åºå颿°ãã·ã§ã«å ã§ã®æ»ãå€ã®åºåã§ã¯ãErlang ã¯ãªã¹ãããã€ããªããŒã¿ã® äžã®æååããã¥ãŒãªã¹ãã£ãã¯ã«æ€åºããããšããŸãã éåžžããã¥ãŒãªã¹ãã£ãã¯ãªæ€åºãšããã®ã¯ããã®ãããªç¶æ³ã§èŠãããŸã
1> [97,98,99]. "abc" 2> <<97,98,99>>. <<"abc">> 3> <<195,165,195,164,195,182>>. <<"åÀö"/utf8>>
ããã§ã¯ãã·ã§ã«ã¯ãã€ãåäœãŸã㯠UTF-8 ãšã³ã³ãŒãã£ã³ã°ã®ããããã§å°å·å¯èœãª
æåãå«ããã€ããªãŸãã¯å°å·å¯èœãªãå«ããªã¹ããæ€åºããŸãã
ããã§åé¡ã§ã: å°å·å¯èœãªæåãšã¯ïŒ
äžã€ã¯ã Unicode æšæºãå°å·å¯èœãšèããŠãããã®ã¯ãªãã§ãããã¥ãŒãªã¹ãã£ãã¯ãª
æ€åºã«åŸã£ãŠå°å·å¯èœãšãªãã¹ãããšããèãæ¹ã§ããçµæã¯ãæŽæ°ã®ãªã¹ãã®ã»ãšãã©
ãã¹ãŠãæååãšã¿ãªãããçµæãšããŠããªãã®ã¿ãŒããã«ããã®æåã»ãããæã£ãŠ
ããªããããããªããããããçš®é¡ã®æåãå°å·ããã (çµæãšããŠããã€ãã®äžè¬çãª
åºåã«ãããŠã¯ããããããªããããããªã) ãšããããšã«ãªãã§ãããã
ããäžã€ã®ããããã¯ãISO-Latin-1 æåã»ãããæååã®æ€åºã«äœ¿ããããã«ãäžäœ
äºææ§ãä¿ã€ããšã§ããïŒã€ç®ã®æ¹æ³ã¯ Unicode ç¯å²ãæåãšããŠè¡šç€ºãããããšã
æ£ç¢ºã«ãŠãŒã¶ã«æ±ºããããããšã§ãã
R16B ã§ã¯èµ·åãã©ã° +pc ã« latin1 ãŸã㯠unicode ã®ç¯å²ãæž¡ããŠã
Unicode ç¯å²å
šäœã ISO-Latin-1 ã®ç¯å²ã®ãããããéžæã§ããŸããäžäœäºææ§ã
ç¶æãããããããã©ã«ãå€ã¯ latin1 ã§ããããã¯ãã¥ãŒãªã¹ãã£ãã¯ãªæåå
æ€åºãå¶åŸ¡ããã ãã§ããå°æ¥çã«ããŠãŒã¶ã«é¢é£ããèšèªãå°åã«å¯ŸããŠã
ãã¥ãŒãªã¹ãã£ã¯ã¹ã調æŽã§ããããã«ãããå€ãã®ç¯å²ã远å ããããšãæåŸ
ãããŠ
ããŸãã
äºã€ã®ç°ãªãèµ·åãªãã·ã§ã³ã§ãäŸãèŠãŠã¿ãŸããã
$ erl +pc latin1 Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.10.1 (abort with ^G) 1> [1024]. [1024] 2> [1070,1085,1080,1082,1086,1076]. [1070,1085,1080,1082,1086,1076] 3> [229,228,246]. "åÀö" 4> <<208,174,208,189,208,184,208,186,208,190,208,180>>. <<208,174,208,189,208,184,208,186,208,190,208,180>> 5> <<229/utf8,228/utf8,246/utf8>>. <<"åÀö"/utf8>>
$ erl +pc unicode Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.10.1 (abort with ^G) 1> [1024]. "Ð" 2> [1070,1085,1080,1082,1086,1076]. "ЮМОкПЎ" 3> [229,228,246]. "åÀö" 4> <<208,174,208,189,208,184,208,186,208,190,208,180>>. <<"ЮМОкПЎ"/utf8>> 5> <<229/utf8,228/utf8,246/utf8>>. <<"åÀö"/utf8>>
äŸã§ã¯ãããã©ã«ãã® Erlang ã·ã§ã«ã ISO-Latin-1 ç¯å²ã®æåã ããå°å·å¯èœãšããŠ
è§£éãããããã® "å°å·å¯èœ" ãªæåãå«ããªã¹ãããã€ããªã®ã¿ãæååããŒã¿ãšããŠ
æ€åºããŠããããšãããããŸãã"ЮМОкПЎ" ãå«ã æå¹ãª UTF-8 ãã€ããªã¯ã
æååãšããŠåºåãããŠããŸããã
ãã®äžæ¹ããã¹ãŠã® Unicode æåãå°å·å¯èœã«ããŠèµ·å (+pc unicode) ããå Žåã
ã·ã§ã«ã¯å°å·å¯èœãª Unicode ããŒã¿ãå«ããã® (UTF-8 ããã€ãåäœã§ãšã³ã³ãŒãããã
ãã€ããª) ããã¹ãŠæååããŒã¿ãšããŠåºåããŠããŸãã
ãããã®ãã¥ãŒãªã¹ãã£ã¯ã¹ã¯ io(_lib):format/2 ãããã«é¡ãã颿°ã§ã t
修食åã ~p ã ~P ãšçµã¿åãããŠäœ¿ãå Žåã«ã䜿çšãããŸã
$ erl +pc latin1
Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.10.1 (abort with ^G)
1> io:format("~tp~n",[{<<"åÀö">>, <<"åÀö"/utf8>>, <<208,174,208,189,208,184,208,186,208,190,208,180>>}]).
{<<"åÀö">>,<<"åÀö"/utf8>>,<<208,174,208,189,208,184,208,186,208,190,208,180>>}
ok
$ erl +pc unicode
Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.10.1 (abort with ^G)
1> io:format("~tp~n",[{<<"åÀö">>, <<"åÀö"/utf8>>, <<208,174,208,189,208,184,208,186,208,190,208,180>>}]).
{<<"åÀö">>,<<"åÀö"/utf8>>,<<"ЮМОкПЎ"/utf8>>}
ok
ã€ã³ã¿ã©ã¯ãã£ã Erlangã·ã§ã« -- 端æ«ããŸã㯠Windows ã§ werl ã³ãã³ãã
䜿ã£ãŠå®è¡ããŸã -- ã¯ãUnicodeã®å
¥åºåããµããŒãããŸãã
Windows ã§ã¯ãé©åãªãã©ã³ããã€ã³ã¹ããŒã«ãããŠããŠãErlang ã¢ããªã±ãŒã·ã§ã³ã ããã䜿çšã§ããããã«é©åã«èšå®ãããŠããããšãå¿ èŠã§ããããªãã®ã·ã¹ãã ã§ å©çšå¯èœãªãé©åãªãã©ã³ãããªãå Žåã¯ãDejaVu ãã©ã³ã (dejavu-font.org) ã èªç±ã«å©çšå¯èœãªã®ã§ããããã€ã³ã¹ããŒã«ããŠãErlang ã·ã§ã«ã¢ããªã±ãŒã·ã§ã³çšã® ãã©ã³ããšããŠã¿ãŠãã ããã
Unix ã©ã€ã¯ã®ãªãã¬ãŒãã£ã³ã°ã»ã·ã¹ãã ã§ã¯ã端æ«ã¯å
¥åºåã® UTF-8 ãæ±ããã¯ã
(äŸãã°ã XTerm, KDE konsole ã Gnome terminal ã®æè¿ã®ããŒãžã§ã³) ãªã®ã§ã
ããªãã¯ãé©åãªãã±ãŒã«ãèšå®ããå¿
èŠããããŸããäŸãšããŠãç§ã® LANG
ç°å¢å€æ°ã¯ããã®ããã«èšå®ãããŠããŸã
$ echo $LANG en_US.UTF-8
å®éã«ã¯ãã»ãšãã©ã®ã·ã¹ãã ã§ã¯ LANG ãããåã« LC_CTYPE 倿°ãæ±ã
ã®ã§ããã®å€æ°ãèšå®ãããŠããå Žåã¯ã UTF-8 ã«èšå®ããå¿
èŠããããŸã
$ echo $LC_CTYPE en_US.UTF-8
LANG ãŸã㯠LC_CTYPE ã®èšå®ã¯ç«¯æ«ã§ã§ããããšäžèŽããå¿
èŠããããŸããã
Erlang ãå®éã®ã¿ãŒããã«ã« UTF-8 ã®å¯Ÿå¿ç¶æ³ãåãåãããããŒã¿ãã«ãªæ¹æ³ã¯ãªã
ã®ã§ãç§ãã¡ã¯èšèªãšæåã¿ã€ãã®èšå®ã«äŸåãããããããŸããã
Erlang ãã¿ãŒããã«ãã©ãèªèããŠãããã調ã¹ãããã«ã¯ãã·ã§ã«ãèµ·åããéã«ã
io:getopts() åŒã³åºãã䜿ããŸã
$ LC_CTYPE=en_US.ISO-8859-1 erl
Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.10.1 (abort with ^G)
1> lists:keyfind(encoding, 1, io:getopts()).
{encoding,latin1}
2> q().
ok
$ LC_CTYPE=en_US.UTF-8 erl
Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.10.1 (abort with ^G)
1> lists:keyfind(encoding, 1, io:getopts()).
{encoding,unicode}
2>
ãã±ãŒã«èšå®ããã©ã³ãããããŠç«¯æ«ãšãã¥ã¬ãŒã¿ããã¹ãŠé©åã«èšå®ããŠåã㊠(ããããïŒ) ãã¹ã¯ãªããã«ããªããæãæåãå ¥åããè¡ãæã«ããã§ãããã ãã¹ããšããŠæãç°¡åãªæ¹æ³ã¯ãæ®æ®µãã¹ã¯ãããäžã®äœãã®ã¢ãã¬ããã䜿ã£ãŠ è¡ãªãããšã«ä»ã®èšèªã®ããŒããŒããããã³ã°ã远å ããããšã§ãã ç§ã® KDE ç°å¢ã®å ŽåãKDEã³ã³ãããŒã«ã»ã³ã¿ãŒ (å人èšå®) ãç«ã¡äžãã"å°åãš ã¢ã¯ã»ã·ããªãã£" ãã "ããŒããŒãã¬ã€ã¢ãŠã" ãéžæããŸãã Windows XP ã®å Žåã¯ã [ã³ã³ãããŒã«ããã«] -> [å°åãšèšèªãªãã·ã§ã³] ãç«ã¡äžããŠã [èšèª] ã¿ããéžæããŠã"ããã¹ããµãŒãã¹ãšå ¥åèšèª" ãšããååã®æ ã®äžã«ãã [詳现] ãã¿ã³ãã¯ãªãã¯ããŸããã䜿çšã®ç°å¢ã¯ãããããããŒããŒãã¬ã€ã¢ãŠãã 倿Žããåæ§ã®ææ®µãæäŸããŠããŸãã äŸãã°ãããªã«æåã»ããã䜿ã£ãŠ Erlang ã·ã§ã«ã§ã³ãã³ããå ¥åããã®ã¯ç°¡åã§ã¯ ãªãã®ã§ããã®æ¹æ³ã䜿ã£ãŠããªãå Žåã¯ãç°¡åã«ããŒããŒããåãæ¿ããæ¹æ³ããã ããšã確èªããŸãããã
ããããäœããã® Unicode å ¥åºåèšå®ãã§ããŸããããã£ãšãã·ã³ãã«ãªããæ¹ã¯ã ãã¡ããã·ã§ã«å ã§æååãå ¥åããããšã§ã
$ erl
Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false]
Eshell V5.10.1 (abort with ^G)
1> lists:keyfind(encoding, 1, io:getopts()).
{encoding,unicode}
2> "ЮМОкПЎ".
"ЮМОкПЎ"
3> io:format("~ts~n", [v(2)]).
ЮМОкПЎ
ok
4>
æåå㯠Unicode æååãšããŠå ¥åããããšãã§ããŸãããäžæ¹ã§èšèªèŠçŽ ã¯ ISO-Latin-1 æåã»ããã«å¶éãããŠããŸããæå宿°ãšæååã ããããã®ç¯å²ã è¶ ããããŸã
$ erl Erlang R16B (erts-5.10.1) [source] [async-threads:0] [hipe] [kernel-poll:false] Eshell V5.10.1 (abort with ^G) 1> $Ο. 958 2> ЮМОкПЎ. * 1: illegal character 2>
æè¿ã®ã»ãšãã©ã®ãªãã¬ãŒãã£ã³ã°ã»ã·ã¹ãã ã§ã¯ãããã€ãããæ¹æ³ã®ã©ããã§ Unicode ãã¡ã€ã«åããµããŒãããŠããŸããããã€ãç°ãªãæ¹æ³ããããŸããã Erlang ã¯ããã©ã«ãã§æ§ã ãªç°ãªãã¢ãããŒããæ±ããŸã:
匷å¶ç Unicode ãã¡ã€ã«ããŒãã³ã°
Windows ãšããã£ãšãäžè¬çãªç®çã§äœ¿ã MacOS X ã¯ãã¡ã€ã«åã® UnicodeãµããŒã ã匷å¶ããŸãããã¡ã€ã«ã·ã¹ãã äžã§äœæããããã¹ãŠã®ãã¡ã€ã«ãäžè²«ã㊠解éã§ããååãæã£ãŠããŸããMacOS X ã§ã¯ãã¹ãŠã®ãã¡ã€ã«å㯠UTF-8 ãšã³ã³ãŒãã£ã³ã°ã§èªã¿åºããŸãããäžæ¹ã§ Windows 㯠Unicode æºæ ã®ç¹å¥ãª å€ç°ãæã€ãã¡ã€ã«åãæ±ãåã·ã¹ãã ã³ãŒã«ã«ãã»ãŒåæ§ã®å¹æãäžããææ³ã éžæããŠããŸãã Erlang VM ã®ããã©ã«ãã®åäœã¯ "Unicode ãã¡ã€ã«å倿ã¢ãŒã" -- ãã¡ã€ã«åã Unicode ãªã¹ãã§äžãããããããèªåçã«åºç€ãšãªããªãã¬ãŒ ãã£ã³ã°ã»ã·ã¹ãã ããã¡ã€ã«ã·ã¹ãã ã«ãšã£ãŠé©åãªãã¡ã€ã«åãšã³ã³ãŒãã£ã³ã°ã« 倿ãããããšæå³ããŸã -- ã§åäœããã®ã§ããããã®ãã¡ã€ã«ã·ã¹ãã ã§ã¯ Unicode ã§ãªããã¡ã€ã«åã¯ååšããŸããã
ã€ãŸãããããã®ã·ã¹ãã ã®ããããã§
file:list_dir/1ãå®è¡ãããšãå®éã® ãã¡ã€ã«ã·ã¹ãã ã®å 容ã«å¿ããŠã255ãè¶ ããã³ãŒããã€ã³ãã® Unicode ãªã¹ãã è¿ãããšããããŸãããã®æ©èœã¯ããªãæ°ãããã®ãªã®ã§ã255ãã倧ããã³ãŒããã€ã³ãã®æåãå«ã ãã¡ã€ã«åãåŠçã§ããªããããªãéã³ã¢ã¢ããªã±ãŒã·ã§ã³ã§èºããããããŸãããã ã³ã¢ã® Erlang ã·ã¹ãã ã§ã¯ Unicode ãã¡ã€ã«åã§åé¡ã¯ãªãã¯ãã§ãã
ééçãã¡ã€ã«ããŒãã³ã°
ã»ãšãã©ã® UNIX ãªãã¬ãŒãã£ã³ã°ã»ã·ã¹ãã ã§ã¯ãããã·ã³ãã«ãªææ³ -- ã€ãŸã Unicode ãã¡ã€ã«åã匷å¶ããã®ã§ã¯ãªããæ £äŸã«ãããã -- ãæ¡çšããŠããŸãã ãããã®ã·ã¹ãã ã¯éåžžãUnicode ãã¡ã€ã«åã« UTF-8 ã䜿çšããŸãããããã 匷å¶ã¯ããŸããããã®ãããªã·ã¹ãã ã§ã¯ã128 ãã 255 ã®éã®ã³ãŒããã€ã³ãã æã€æåãå«ããã¡ã€ã«åã¯ãã¬ãŒã³ãª ISO-latin-1 ã UTF-8 ãšã³ã³ãŒãã£ã³ã°ã 䜿çšããŠååãã€ããããšãã§ããŸããæŽåæ§ã匷èŠãããªãã®ã§ãErlang VM 㯠ãã¹ãŠã®ãã¡ã€ã«åã®äžè²«ãã翻蚳ã¯ã§ããŸãããVM ãèªåçã«çµéšåã«åºã㊠ãšã³ã³ãŒãã£ã³ã°ãéžæãããšããããã®ã·ã¹ãã äžã§äºæããªãåäœããã ã§ããããããã©ã«ãã§ Erlang ã¯ãã®ãããªãã¡ã€ã«ã·ã¹ãã äžã§ã¯ "latin1" ãã¡ã€ã«åã¢ãŒãã§èµ·åããŸãããããã¯ãã¡ã€ã«åããã€ãåäœãšã³ã³ãŒãã£ã³ã° ã§ããããšãæå³ããŸãã ããã¯ã·ã¹ãã å ã®ãã¹ãŠã®ãã¡ã€ã«åã®ãªã¹ã衚çŸãå¯èœã«ããŸãããäŸãã° "Ãstersund.txt" ãšãããã¡ã€ã«åã¯ãfile:list_dir/1ã§ "Ãstersund.txt" (ISO-Latin-1 ã®ãã€ãåäœãšã³ã³ãŒãããããã¡ã€ã«åã§ããã°ã©ã ãäœã£ãå Žå)ã ãŸãã¯ãããã [195,150,115,116,101,114,115,117,110,100] ã®ããã«ãUTF-8 ãã€ããå«ããªã¹ããšããŠè¡šç€ºãããã®ã§ãããªããæ¬²ãããã®ã§ã¯ãªããã ãããŸãããäžæ¹ã§ãã®ãããªã·ã¹ãã ã§ Unicode ãã¡ã€ã«å倿ã䜿çšãããšã UTF-8 ã§ãªããã¡ã€ã«åã¯file:list_dir/1ã®ãããªé¢æ°ã«ã¯ãåçŽã« ç¡èŠãããŸãããã®ãããªãã¡ã€ã«ã¯file:list_dir_all/1ã§ååŸã§ããŸããã 誀ã£ãŠãšã³ã³ãŒãããããã¡ã€ã«å㯠"raw file names" ãšè¡šç€ºãããŸãã
Unicode ãã¡ã€ã«ããŒãã³ã°ãµããŒã㯠OTP ãªãªãŒã¹ R14B01 ã§å°å ¥ãããŸããã Unicode ãã¡ã€ã«å倿ã¢ãŒãã§åäœãã VM ã¯ãä»»æã®èšèªãŸãã¯æåã»ããã®ååã æã€ãã¡ã€ã«ã (ãããåºç€ãšãªã OS ããã¡ã€ã«ã·ã¹ãã ã«ãã£ãŠãµããŒããããŠãã éã) åŠçã§ããŸããUnicode æåãªã¹ãã¯ãã¡ã€ã«åããã£ã¬ã¯ããªåã衚瀺ãããã ã«äœ¿çšããããã¡ã€ã«ã·ã¹ãã ã®å 容ã衚瀺ãããŠããå Žåã¯ãæ»ãå€ãšããŠã Unicode ãªã¹ããååŸããŸããæ®ã©ã®ã¢ããªã±ãŒã·ã§ã³ (ãã¡ã€ã«åãæç€ºçã« ISO-Latin-1 ã® ç¯å²å ã§ããå¿ èŠã¯ãããŸãã) ãã倿Žããããšãªã Unicode ãµããŒãã®æ©æµã åããããçç±ã¯ãKernel ã¢ãžã¥ãŒã«ãš STDLIB ã¢ãžã¥ãŒã«ã«ãããŸãã
Unicode ãã¡ã€ã«åã匷å¶ãããªãã¬ãŒãã£ã³ã°ã»ã·ã¹ãã ã®å Žåããã㯠(Erlangã§ã¯ ãªã) ä»ã®ã¢ããªã±ãŒã·ã§ã³ã®ãã¡ã€ã«åã«ç°¡åã«é©å¿ãããããããšããŸã å°ãªããšã Windows ã§ã¯ (ISO-Latin-1 ã§è¡šçŸã§ããªããã¡ã€ã«åæã€ããšãåå ã§) å šãã¢ã¯ã»ã¹ã§ããªãã£ããã¡ã€ã«ãåŠçã§ããããšãæå³ããŸãããŸãããã¹ãŠã® ãã¡ã€ã«åã UTF-8 ãšããŠåãå ¥ãã OS ã® VFS ã¬ã€ã€ãŒãšããŠã® MacOS X ã«ãããŠã çè§£ã§ããªããã¡ã€ã«åãäœãããšãé¿ãããããŸãæžãæããããšããªãã§ãããã
ã»ãšãã©ã®ã·ã¹ãã ã§ã¯ãééçãã¡ã€ã«ããŒãã³ã°ã䜿çšããŠããå Žåã§ãã£ãŠãã
Unicode ãã¡ã€ã«å倿ãæå¹ã«ããããšã¯åé¡ãããŸãããããå°æ°ã®ã·ã¹ãã ã§ã¯
è€æ°ã®ãã¡ã€ã«åãšã³ã³ãŒãã£ã³ã°ãæ··åšããŠããŸããäžè²«ã㊠UTF-8 ã§åœåããã
ã·ã¹ãã ã§ã¯ãUnicode ãã¡ã€ã«åã¢ãŒãã§å®ç§ã«åäœããŸããããããã®ãããª
ã·ã¹ãã 㯠R14B01 ã§ã¯ãŸã å®éšçã§ãããšèããããŠããŠããã®ãããªã·ã¹ãã ã§ã¯
ãŸã ããã©ã«ãã§ã¯ãããŸãããLinux äžã§ Unicode ãã¡ã€ã«å倿ã +fnu
ã¹ã€ãããšå
±ã«æå¹ã«ãããšãVMã¯ãã¡ã€ã«å倿ã¢ãŒãããã€ãã£ããã¡ã€ã«å
ãšã³ã³ãŒãã£ã³ã°ã§ãã latin1 ã¢ãŒããããã©ã«ãã«ãªãããšã瀺ããã«èµ·åããŸãã
WindowsãMacOS X ã®å Žåããããã®ã·ã¹ãã (Windowsã®å Žåããã¡ã€ã«ã·ã¹ãã ã¬ãã«
ã§ã¯ UTF-8 ã䜿çšããŠããªããšããäºå®ããErlangããã°ã©ãã¯å®å
šã«ç¡èŠã§ããŸã)
ã§ã¯ file:native_name_encoding/0 ãããã©ã«ãã§ utf8 ãè¿ãã®ã§ã
ããã©ã«ãã®åäœã¯ Unicode ãã¡ã€ã«å倿ã®ããã§ãã
ãã§ã«è¿°ã¹ãéããããã©ã«ãã®åäœã¯ +fnu ãŸã㯠+fnl ã®ãªãã·ã§ã³ã
VM ã«äžããããšã«ãã£ãŠå€æŽã§ããŸãã erl ããã°ã©ã ãåç
§ããŠãã ããã
VM ã Unicode ãã¡ã€ã«å倿ã¢ãŒãã§èµ·åããå Žåã file:native_name_encoding
ã¯ã¢ãã utf8 ãè¿ããŸãã
+fnu ã¹ã€ããã¯ã w, i ãŸã㯠e ãä»äžããŠã誀ã£ããšã³ã³ãŒãã®
ãã¡ã€ã«åãå ±åããæ¹æ³ãå¶åŸ¡ã§ããŸãã w ã¯ããã£ã¬ã¯ããªäžèЧã§èª€ã£ã
ãšã³ã³ãŒããããããã¡ã€ã«åã "ã¹ããã" ããå Žåãåžžã« error_logger ãž
èŠåãéãããããšã i ã¯ãããã®èª€ã£ããšã³ã³ãŒãã®ãã¡ã€ã«åãé»ã£ãŠç¡èŠãã
ããšã e 㯠API 颿°ã誀ã£ããšã³ã³ãŒãã®ãã¡ã€ã« (ãŸãã¯ãã£ã¬ã¯ããª) åã«
ééããå Žåãåžžã«ãšã©ãŒãè¿ãããšãæå³ããŸãã w ãããã©ã«ãã§ãããªãã
file:read_link/1 ã¯ãªã³ã¯ãäžæ£ãªãã¡ã€ã«åãæã瀺ããŠããå Žåã«ã¯ãåžžã«
ãšã©ãŒãè¿ãããšã«æ³šæããŠãã ããã
Unicode ãã¡ã€ã«åã¢ãŒãã§ã¯ããã¡ã€ã«åã¯ãªãã·ã§ã³ {spawn_executable,...}
ãšå
±ã« BIF open_port/2 ã«äžããããUnicode ãšããŠè§£éãããŸãããªã®ã§ã
spawn_executable ã䜿ãå ŽåãåŒæ°ã®ãªãã·ã§ã³ã§ãã©ã¡ãŒã¿ãªã¹ãã䜿ããŸãã
åŒæ°ã® UTF-8 倿ã¯ãã€ããªã䜿ãããšã§åé¿ã§ããŸããåŸè¿°ã® raw ãã¡ã€ã«åã«
é¢ãã説æãåç
§ããŠãã ããã
ãã¡ã€ã«ãéããšãã«æå®ããããã¡ã€ã«ã®ãšã³ã³ãŒãã£ã³ã°ãªãã·ã§ã³ã¯ããã¡ã€ã«å ã®ãšã³ã³ãŒãã£ã³ã°èŠåãšã¯äœã®é¢ä¿ããªããšããããšã¯ã泚ç®ã«å€ããŸããé »ç¹ã«éã ãã¡ã€ã«ã®å 容ã UTF-8 ãšã³ã³ãŒãã ãããã€ãåäœ (latin1) ãšã³ã³ãŒãã£ã³ã°ã® ãã¡ã€ã«åããã€ããŸãã¯ãã®éãšããããšãã§ããŸãã
Note
Erlang ã®ãã©ã€ããŸã㯠NIF å ±æãªããžã§ã¯ãã¯ããŸã 127 ãè¶ ããã³ãŒããã€ã³ã ãå«ãååã䜿ãããšãã§ããŸãããããã¯å°æ¥ã®ãªãªãŒã¹ã§åé€ãããããšã ç¥ãããŠããå¶éã§ããErlang ã¢ãžã¥ãŒã«ã¯ãããã§ããŸãããééããªãè¯ã ã¢ã€ãã¢ã§ã¯ãããŸãããããŸã å®éšçãªãã®ãšèããããŠããŸãã
Raw ãã¡ã€ã«å㯠erts-5.8.2 (OTP R14B01) ã§ Unicode ãã¡ã€ã«åãµããŒããšäžç·ã«
å°å
¥ãããŸããã "Raw ãã¡ã€ã«å" ãã·ã¹ãã ã«å°å
¥ãããã®ã¯ãåãã·ã¹ãã äžã§
ç°ãªããšã³ã³ãŒãã£ã³ã°ãæå®ããããã¡ã€ã«åãäžè²«ããŠè¡šçŸã§ããããã«ãããã
ã§ãããUTF-8 ã§ãªããã¡ã€ã«åã Unicode æåã®ãªã¹ãã«èªåçã«å€æãã VM ã
æã€ããšã¯å®çšçã«èŠãããããããŸããããããã¯ååã®éè€ãããã®ä»ã®äžè²«æ§ã
æ¬ ããæ¯ãèãããããããªããŸããErlang VM ã Unicode ãã¡ã€ã«åã¢ãŒãã§ (
ãããã£ãŠ UTF-8 ã§ãã¡ã€ã«åœåãããããšãæåŸ
ããŠ) åäœããŠãããšããŠã
ISO-Latin-1 ã® "björn" ãšããååããã€ãã¡ã€ã«ãå«ããã£ã¬ã¯ããªãèããŠ
ã¿ãŸããããISO-Latin-1 ã®ååã¯æå¹ãª UTF-8 ã§ã¯ãªããäŸãã° file:list_dir/1
ã§ãèªå倿ãèæ
®ããããšãããã®ããšã¯ããã¢ã€ãã¢ã§ãããããããã¡ã€ã«ã
éããŠã(éæ³ã®ããã« ISO-Latin-1 ã®ãã¡ã€ã«åãã倿ããã) Unicode ãªã¹ãã®
ååãã€ãããäœãèµ·ããã§ããããïŒ VM ã¯ãããã¯æåŸ
ããã³ãŒãã£ã³ã°ãªã®ã§ã
äžãããããã¡ã€ã«åã UTF-8 ã«å€æããŸããäºå®äžããã㯠<<"björn"/utf8>> ãš
ããååã®ãã¡ã€ã«ãéãããšããŠããããšãæå³ããŸãããã®ãã¡ã€ã«ã¯ååšããŸãã
ããããååšãããšããŠãäžèЧã«è¡šç€ºããããã®ãšåããã¡ã€ã«ã§ã¯ãªãã§ãããã
"björn" ãšããååã®ã²ãšã€ã¯ UTF-8 ãšã³ã³ãŒãã£ã³ã°ã§åœåãã仿¹ã¯ããã§ã¯ãªã
ãã¡ã€ã«ãïŒã€äœãããšãã§ããŸãããã file:list_dir/1 ã ISO-Latin-1 ã®
ãã¡ã€ã«åãèªåçã«ãªã¹ãã«å€æããã°ãçµæãšããŠïŒã€ã®åäžã®ãã¡ã€ã«åãåŸã
ããšã«ãªããŸãããããé¿ãããããUnicode ãã¡ã€ã«åœåèŠåã«ãããã£ãŠé©åã«
ãšã³ã³ãŒãããããã¡ã€ã«å (ã€ãŸããUTF-8) ãšããã®ãšã³ã³ãŒãã®å
ã§ã¯ç¡å¹ãª
ãã¡ã€ã«åãåºå¥ããå¿
èŠããããŸããäžè¬ç㪠file:list_dir/1 颿°ã®å Žåã
誀ã£ãŠãšã³ã³ãŒãããããã¡ã€ã«å㯠Unicode ãã¡ã€ã«å倿ã¢ãŒãã§ã¯åã«
ç¡èŠãããŸããã file:list_dir_all/1 ã®å Žåãç¡å¹ãªãšã³ã³ãŒãã£ã³ã°ã®
ãã¡ã€ã«å㯠"ç" ã®ãã¡ã€ã«åãããªãã¡ãã€ããªãšããŠè¿ãããŸãã
Erlang ã® file ã¢ãžã¥ãŒã«ã¯ãå
¥åãšã㊠"ç" ã®ãã¡ã€ã«åãåãä»ããŸãã
open_port({spawn_executable,...} ...) ãåãä»ããŸããå
ã«è¿°ã¹ããšããã
open_port({spawn_executable,...} ...) ã«ãªãã·ã§ã³ãªã¹ãã§äžããããåŒæ°ã¯ã
ãã¡ã€ã«åãšåæ§ã®å€æãåããã®ã§ãå®è¡å¯èœãã¡ã€ã«ãåæ§ã«ãåŒæ°ãšå
±ã« UTF-8 ã§
æäŸãããããšãæå³ããŸãã
ãã®å€æã¯åŒæ°ããã€ããªã§äžããããšã«ãã£ãŠããã¡ã€ã«åãã©ãæ±ããããã
äžè²«ããŠåé¿ããããšãã§ããŸãã
Unicode ãã¡ã€ã«å倿ã¢ãŒãããããããã©ã«ãã§ãªãã·ã¹ãã ã§åŒ·å¶ããããšã¯ã
åæã®å®è£
ã§ã¯ã誀ã£ããšã³ã³ãŒãã®ãã¡ã€ã«åãç¡èŠããªãã£ããšããäºå®ããã
OTP R14B01 ã§ã¯å®éšçãšèããããŠããŸããããã®ãããRaw ãã¡ã€ã«åãã·ã¹ãã
å
šäœã«äºæããæ¡å€§ããå¯èœæ§ããããŸããR16B 以éã誀ã£ããšã³ã³ãŒãã®ãã¡ã€ã«åã¯
ç¹å¥ãªé¢æ° (ããšãã° file:list_dir_all/1) ã«ãã£ãŠã®ã¿ååŸããããŸã "Raw"
ãã¡ã€ã«åããµããŒããããããã«ãªã£ãã®ã§ãæ¢åã®ã³ãŒããžã®åœ±é¿ã¯ã¯ããã«
å°ãªãã§ãã
Unicode ãã¡ã€ã«å倿ã¯ãå°æ¥ã®ãªãªãŒã¹ã§ã¯ããã©ã«ãã«ãªããšäºæ³ãããŸãã
ããããªãã VM ã«ããèªåç㪠Unicode ãã¡ã€ã«å倿ã䜿ã£ãŠããªããšããŠãã UTF-8 ãšããŠãšã³ã³ãŒãããã Raw ãã¡ã€ã«åãçšããŠãUTF-8 ãšã³ã³ãŒãã£ã³ã°ã® ååãæã€ãã¡ã€ã«ã®ã¢ã¯ã»ã¹ãäœæãã§ããŸããErlang VM ãèµ·åããŠããã§ããã ã¢ãŒãã«é¢ä¿ãªã UTF-8 ãšã³ã³ãŒãã£ã³ã°ã匷å¶ããããšã¯ UTF-8 ã®ãã¡ã€ã«åã 䜿ãèŠåãåºãã£ãŠããã®ãšåæ§ã«ãããã€ãã®ç¶æ³ã«ãããŠã¯è¯ãã¢ã€ãã¢ã§ãã
MacOS X ã® VFS ã¬ã€ã€ãŒã¯éåžžã«ç©æ¥µçãªæ¹æ³ã§ UTF-8 ãã¡ã€ã«åã匷å¶ããŸãã
å€ãããŒãžã§ã³ã§ã¯ãåçŽã« UTF-8 ã«æºæ ããªããã¡ã€ã«åã®äœæãæåŠããããšã§
ãããè¡ã£ãŠããŸããããããæ°ããããŒãžã§ã³ã§ã¯åé¡ã®ãã€ãã "%HH" ã·ãŒã±ã³ã¹
(HH ã¯ãªãªãžãã«ã®æåã®16鲿°è¡šçŸã§ã) ã§çœ®ãæããŸããUnicode å€æã¯ MacOS X
ã§ã¯ããã©ã«ãã§æå¹ã«ãªã£ãŠããã®ã§ããããåé¡ã«ãªãã®ã¯ VM ã +fnl ãã©ã°
ãã€ããŠèµ·åãããããã€ãåäœ (latin1) ãšã³ã³ãŒãã£ã³ã°ã®ãã¡ã€ã«åã䜿ãå Žå
ã ãã§ãã127 ãã 255 ã®éã®ã³ãŒããã€ã³ãã®æåãå«ããã€ãåäœãšã³ã³ãŒãã£ã³ã°
ã®Rawãã¡ã€ã«åã䜿ã£ãŠãã¡ã€ã«ãäœæãããšããã®ãã¡ã€ã«ã¯ãã¡ã€ã«ãäœæãã
ãšããšåãååã§éãããšã¯ã§ããŸããããã¡ã€ã«åãæ£ãããšã³ã³ãŒãã£ã³ã°ã«ä¿ã€
以å€ã«ããã®åäœã«å¯Ÿããææžçã¯ãããŸããã
MacOS X ã¯ãã¡ã€ã«ã®ååãåç·šæãããã®ã§ãã¢ã¯ã»ã³ãçã®è¡šçŸã¯ "çµåæå" ã 䜿ããŸããããªãã¡ãæå ö ã¯ã³ãŒããã€ã³ã [111,776] (111 㯠æå o ã776 㯠ç¹å¥ãªã¢ã¯ã»ã³ãæå "çµåçšãŠã ã©ãŠã"ã§ã) ãšããŠè¡šçŸãããŸãã ãã®ãŠãã³ãŒãæ£èŠåã®æ¹æ³ã¯ãã£ãã«äœ¿ãããªãã®ãšãErlang ã¯æ€çŽ¢æã«ãããš åå¯Ÿã®æ¹æ³ã§ãããã®ãã¡ã€ã«åãæ£èŠåããã®ã§çµåçšã¢ã¯ã»ã³ãã䜿çšãã ãã¡ã€ã«å㯠Erlang ã¢ããªã±ãŒã·ã§ã³ã«æž¡ãããŸãããErlang ã¯ãã¡ã€ã«å "björn" ãããã¡ã€ã«ã·ã¹ãã ãç°ãªããã®ãšèªèãããããããªãã«ãé¢ãããã[98,106,117, 776,114,110] ã§ã¯ãªã [98,106,246,114,110] ãšããŠååŸããŸããå®éã«ãã¡ã€ã«ã« ã¢ã¯ã»ã¹ããéã«ã¯ãã¢ã¯ã»ã³ããçµåããæ£èŠåãããçŽãããã®ã§ãéåžž Erlang ããã°ã©ãã¯ãããç¡èŠããããšãã§ããŸãã
ç°å¢å€æ°ãšãã®è§£éã¯ããã¡ã€ã«åãšã»ãšãã©åãæ¹æ³ã§æ±ãããŸããUnicode ãã¡ã€ã«å ãæå¹ã«ãªã£ãŠããå ŽåãErlang VM ãžã®ãã©ã¡ãŒã¿ãšåæ§ã«ç°å¢å€æ°ã¯ Unicode ã§ãã ããšãæåŸ ãããŸãã
Unicode ãã¡ã€ã«åãæå¹ã«ãªã£ãŠããå Žåã os:getenv/0 ã os:getenv/1
ããã³ os:putenv/2 ã®åŒã³åºã㯠Unicode æååãåŠçããŸããUNIX ã©ã€ã¯ãª
ãã©ãããã©ãŒã ã§ã¯ãçµã¿èŸŒã¿é¢æ°ã¯ç°å¢å€æ°ã UTF-8 ãã Unicode æååãž
(ãŸãã¯ãã®éã) ã255 ãã倧ããã³ãŒããã€ã³ããå«ãå Žåã§ãå¯èœãªéã倿
ããŸããWindows ã§ã¯ãç°å¢ã·ã¹ãã API ã® Unicode ããŒãžã§ã³ã䜿çšããããŸã
255 ãã倧ããã³ãŒããã€ã³ããå¯èœã§ãã
UNIX ã©ã€ã¯ãªãªãã¬ãŒãã£ã³ã°ã»ã·ã¹ãã ã§ã¯ãUnicode ãã¡ã€ã«åãæå¹ã«ãªã£ãŠãã å Žåããã©ã¡ãŒã¿ã¯å€æãªãã® UTF-8 ã§ããããšãæåŸ ãããŸãã
ã»ãšãã©ã® Erlang/OTP ã¢ãžã¥ãŒã«ã¯ãUnicode ã®æŠå¿µãæã£ãŠããŸããããå®éã« æã€å¿ èŠããªããšããæå³ã§ããã¡ãã Unicode ã«ã¯é察å¿ã§ããäžè¬çã«ããã㯠éããã¹ããŸã㯠(gen_tcpã®ãããª) ãã€ãæåããŒã¿ãæ±ããŸãã
å®éã«ããã¹ãããŒã¿ãæ±ãã¢ãžã¥ãŒã« (io_lib ã string ç) ã¯ããšãã©ã Unicode æååãæ±ããããã«ããå€æãæ¡åŒµã®å¯Ÿè±¡ã«ãªããŸãã
幞ããªããšã«ãã»ãšãã©ã®ããã¹ãããŒã¿ã¯ãªã¹ãã«æ ŒçŽãããŠãããç¯å²ã®ãã§ãã¯ã ããããªã®ã§ãstring ã®ãããªã¢ãžã¥ãŒã«ã¯ã¡ãã£ãšããå€æãæ¡åŒµãå¿ èŠãšããã ãã§ ååã«æ©èœããŸãã
ããããäžéšã®ã¢ãžã¥ãŒã«ã¯æç€ºçã« Unicode ã«å¯Ÿå¿ããããã«å€æŽãããŠããŸãã ãããã®ã¢ãžã¥ãŒã«ãå«ãŸããŸã :
unicode
unicode ã¢ãžã¥ãŒã«ã¯æããã« Unicode 察å¿ã§ãããã€ããªãŒããŒããŒã¯ (BOM)
ãèå¥ããããã®å¹Ÿã€ãã®ãŠãŒãã£ãªãã£ãšåæ§ã«ãç°ãªã Unicode éã§ã®å€æã
è¡ã颿°ãå«ã¿ãŸããUnicode ãæ±ãäžéšã®ããã°ã©ã ã¯ãã®ã¢ãžã¥ãŒã«ãªãã§ã
çãæ®ãã§ãããã
io
io ã¢ãžã¥ãŒã«ã¯ Unicode ããŒã¿ãåŠçããããã«ãå®éã® I/Oãããã³ã«ãš
ãšãã«æ¡åŒµãããŠããŸããããã¯ãããã€ãã®é¢æ°ã¯ãã€ããªã UTF-8 ã§ããããšãš
Unicode æååãåºåå¯èœã«ããå¶åŸ¡ã·ãŒã±ã³ã¹ãããããšãå¿
èŠãšããããšã
æå³ããŸãã
file, group, user
ã·ã¹ãã å šäœã®I/OãµãŒãã¯UnicodeãåŠçããããšãã§ããŸãããããã€ã¹ãž/ããã® å®éã®åºåãŸãã¯å ¥åæã«ããŒã¿å€æããããã®ãªãã·ã§ã³ãæã£ãŠããŸãã å ã«ç€ºããããã«ã
shellã¯Unicodeã®ç«¯æ«ããµããŒãããŠããããŸãfileã¢ãžã¥ãŒã«ã¯ãã£ã¹ã¯äžã®ããŸããŸãªUnicodeãã©ãŒããããã/ãžå€æã§ããŸããããããUnicodeããŒã¿ããã€ãã¡ã€ã«ã®å®éã®èªã¿æžããã€ã³ã¿ãŒãã§ã€ã¹ããã€ã æåã®
fileã¢ãžã¥ãŒã«ã§è¡ãã®ã¯æé©ã§ã¯ãããŸããã(UTF-8ãªã©ã®)Unicode ãšã³ã³ãŒãã£ã³ã°ã§éãããŠãããã¡ã€ã«ã¯ãioã¢ãžã¥ãŒã«ã䜿ã£ãŠèªã¿æžã ããã®ãæé©ã§ãã
re
re ã¢ãžã¥ãŒã«ã¯ç¹å¥ãªãªãã·ã§ã³ãšããŠUnicodeæååã®ãããã³ã°ãå¯èœã«
ããŠããŸããã©ã€ãã©ãªã¯å®éã«ã¯ãã€ããªã§ã®ãããã³ã°ãäžå¿ã§ãããUnicode
ãµããŒãã¯UTF-8ãäžå¿ã§ãã
wx
wx ã°ã©ãã£ã«ã«ã©ã€ãã©ãªã¯Unicodeããã¹ããå¹
åºããµããŒãããŠããŸãã
string ã¢ãžã¥ãŒã«ã¯ ISO-Latin-1 æåã»ããã«å¯ŸããŠã®ã¿æ£ããåäœãããããªã
èšèªã«äŸåãã to_upper ã to_lower 颿°ãé€ããŠãISO-Latin-1æååãšåã
ããã«Unicodeæååã«å¯ŸããŠãå®ç§ã«æ©èœããŸãããããã¯å€§æåãšå°æåã®å€æãè¡ã
æã«ãèšèªãšãã±ãŒã«ã®åé¡ãšåæ§ã«è€æ°æåã®ãããã³ã°ãèæ
®ããå¿
èŠãããããã
çŸåšã®åœ¢åŒã§ã¯å®éã«Unicodeæååã«å¯ŸããŠæ£ããåäœããŸãããã€ã³ã¿ãŒãã·ã§ãã«ãª
ç°å¢ã§ã®å€§æåã»å°æåã®å€æã¯ããŸã OTPã§æ±ãããŠããªã倧ããªèª²é¡ã§ãã
The LANG and LC_CTYPE environment variables
The +pc { unicode | latin1 } flag to erl(1)
The +fn {l | a | u } [{ w | i | e }] flag to erl(1)
epp:default_encoding/0
io:setopts/{1,2} and the -oldshell/-noshell flags.
Byte Order Marks
Formatted I/O
Hueristic Identification of UTF-8
Lists of UTF-8 Bytes
Double UTF-8 Encoding