Omschrijving |
Op dit moment wordt voor het gros van de EDI-berichten de UNOA-standaard gehanteerd (alleen bij het factuurbericht een nieuwere versie UNOC syntax level 4). De toepassing hiervan levert bij "vrije tekstvelden" zoals alternatieve productnamen, ordernummers, bedrijsnamen etc. vaak problemen op, omdat er niet-toegestane tekens in voorkomen, wat leidt tot afkeur, en vroeger zelfs knallen van applicaties.
Het wijzigingsverzoek: alle EDI-berichten die nog de UNOA-set gebruiken, om te zetten naar het gebruik van de UNOB-karakterset.
Aanvulling: UNOB is mogelijk te beperkt voor een nieuwe wens dat duitse teksten ondersteund moeten kunnen worden. |
Oplossing |
19-3-2012:
Officieel volgens EDIFACT: Per trading partner afspreken welke karakterset wordt ondersteund indien hoger dan UNOA/UNOB is gewenst. EDIFACT is standaard niet geschikt voor karakterset boven UNOB. Voor de Duitse taal is minimaal UNOC nodig (zoals in Florecom INVOIC bericht) met syntax level 4. Onderstaand meer info:
In syntax version 4 character sets level A, B, C, D, E, F, G, H, I, J, K, X and Y are supported. Also supported are the code extension techniques covered by ISO 2022 (with certain restrictions on its use within an interchange), and the partial use of the techniques covered by ISO/IEC 10646-1.
Within EANCOM� and Florecom the use of character set level A is recommended. Any user, wishing to use a character set level other than A, should first obtain agreement from the intended trading partner in order to ensure correct processing by the receiving application.
Character sets level C to K
Character sets level C to K (ISO 8859 - 1,2,5,7,3,4,6,8,9 8-bit single byte coded graphic character sets) cover the Latin 1 - 5, Cyrillic, Arabic, Greek and Hebrew alphabets. It is important to note that EANCOM� and Florecom users often need, in addition to the recommended character set level A, the following sub-set of supplementary characters taken from ISO 8859 - 1:
Number sign / Commercial at / Left square bracket / Reverse solidus / Right square bracket / Circumflex accent
Grave accent / Left curly bracket / Vertical line / Right curly bracket
# @ [ \ ] ^ ` { | }
haracter set level X
Character repertoire resulting from the application of the code extension technique as defined by ISO 2022, utilising the escape sequence technique in accordance with ISO 2375. For more details see ISO/ICE International Register of Coded Character Sets to be used with Escape Sequences.
Character set level Y
Character repertoire taken from ISO 10646 � 1 octet without the application of a code extension technique. See the appropriate details in ISO 10646 �1.
Syntax Identifier ISO standard / Languages
UNOA 646
UNOB 646
UNOC 8859 - 1
Danish, Dutch, English, Faeroese, Finnish, French, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish, Swedish
UNOD 8859 - 2
Albanian, Czech, English, Hungarian, Polish, Romanian, Serbo-Croatian, Slovak, Slovene
UNOE 8859 - 5
Bulgarian, Byelorussian, English, Macedonian, Russian, Serbo-Croatian, Ukrainian
UNOF 8859 - 7
Greek
UNOG 8859 - 3
Maltese
UNOH 8859 - 4
Estonian, Latvian, Lithuanian, Greenlandic, Lappish
UNOI 8859 - 6
Arabic
UNOJ 8859 - 8
Hebrew, Yiddish
UNOK 8859 - 9
Turkish
UNOX 2022 / 2375
Character sets level C to K supported languages, Asian (e.g. Japanese, traditional Chinese language, �) and other languages that are based on character sets compliant with ISO 2022 and ISO 2375.
UNOY 10646 - 1
Aimed to cover all written languages of the world.
---------------------------
Duitse karakters als umlaut etc. vallen onder diakritische tekens en maken geen deel uit van IS0 646 (karakterset UNOA en UNOB) welk EDIFACT 9735. Het gebruik van andere charactersets zijn 'specifically agreed between the interchanging partners'. Deze andere karaktersets leiden tevens tot een ander syntax level (3 of 4). Voor Duitse karakters voldoet 3 met UNOC karakterset. Voor het INVOIC bericht gebruiken we level 4 met UNOC.
http://www.unece.org/trade/untdid/texts/d422_d.htm#p5
5. CHARACTER SETS
For the characters in the sets below, the 7-bit codes in the basic ISO 646 standard shall be used, unless the corresponding 8-bit codes in ISO 6947 and 8859 or other bit codes are specifically agreed
between the interchanging partners. See clause 4.
UNOC 1SO 8859-1: 1987, Information processing - 8-bit single byte coded graphic character sets - Part 1: Latin alphabet No. 1. This standard supports the following languages: Danish, Dutch, English, Faroese, Finnish, French, German, Icelandic, Irish, Italian, Norwegian, Portuguese, Spanish, Swedish
Version 3
Version 3 is represented by Version 2 plus Amendment 1, published in 1992. Amendment 1 extended the supported character sets from character set A (ISO 646 with the exception of lower case letters and certain graphic characters) and B (ISO 646 with the exception of certain graphic characters) to the character sets C through F (covering Latin, Cyrillic and Greek alphabets).
Version 4
Version 4 represents a significant revision to the syntax rules and supersedes the earlier publications. It is not fully upward compatible with Version 3 (eg. a single set of default service characters are defined in Version 4, where the level A and B character sets in earlier versions, each specified separate service characters). |