H.248 and UTF-8 strings.

Wed Jun 22 22:16:22 EDT 2005

Hello Sasha,

This problem was raised recently on the Megaco list. I've had some
off-line 
discussion with Angelo (the person who raised the problem) and currently
there's 
2 proposed solutions (I hope he would sent this to the Megaco list):

1) The simple one, Code the UTF-8 string in "Octect Mode". This is a BAD
solution from the efficiency transmission point of view because of it
"halve the 
TX band": to TX one UTF-8 char (max 4 ASCII chars) i must TX max 2 x 4 = 8
ASCII chars.

2) The complicated one, allow the ABNF quoted form of VALUE to TX ALL
ASCII chars 0x01-0xFF, except 0x22, that should be ESCAPED with "\", as
already
done for ABNF Local and Remote Descriptor (see SDP). Note that '\0' (0x00)
is 
NOT ALLOWED in this new quoted string form as in the present one, but this
is 
not a problem because in UTF-8 the char '\0' (0x00) is the same as in
ASCII 
(string terminator) and is NOT used to code "non ASCII" UTF-8 chars, all
those 
chars > 0x7F that require more than one ASCII chars to be encoded (from 2
to 4 
ASCII chars). In fact the "extra chars" needed to code an UTF-8 char are
all 
above 0x7F (have the MSBit = 1).

Regards, Christian

Sasha Ruditsky wrote:

> Hi 
> 
> I'm trying to understand how H.248 supports UTF-8 string properties.
> According to H.248 the string property is encoded as UTF-8 string.
> 
> UTF-8 encoding is defined by the following table:
> 
> Scalar Value 			1st Byte 2nd Byte 3rd Byte 4th Byte
> 00000000 0xxxxxxx 		0xxxxxxx
> 00000yyy yyxxxxxx 		110yyyyy 10xxxxxx
> zzzzyyyy yyxxxxxx 		1110zzzz 10yyyyyy 10xxxxxx
> 000uuuuu zzzzyyyy yyxxxxxx 	11110uuu 10uuzzzz 10yyyyyy 10xxxxxx
> 
> 
> I.e. all the character codes between x80 and xf7 need to be supported.
> 
> According to H.248 Annex B.2:
> 
> The ABNF in this section uses the VALUE construct (or lists of VALUE
> constructs) to encode various package element values (properties, signal
> parameters, etc.).
> 
> The VALUE is defined as follows:
> 
>   VALUE                = quotedString / 1*(SafeChar)
>   SafeChar             = DIGIT / ALPHA / "+" / "-" / "&" /
>                           "!" / "_" / "/" / "\'" / "?" / "@" /
>                           "^" / "`" / "~" / "*" / "$" / "\" /
>                           "(" / ")" / "%" / "|" / "."
>   ALPHA                = %x41-5A / %x61-7A ; A-Z / a-z
>   DIGIT                = %x30-39         ; 0-9
>   quotedString         = DQUOTE *(SafeChar / RestChar/ WSP) DQUOTE
>   RestChar             = ";" / "[" / "]" / "{" / "}" / ":" / "," / "#" /
>                           "<" / ">" / "="
>   WSP                  = SP / HTAB ; white space
>   SP                   = %x20        ; space
>   HTAB                 = %x09        ; horizontal tab
>   DQUOTE               = %x22            ; " (Double Quote)
> 
> 
> So I believe this excludes the x80-xff characters.
> 
> So the question is how to text encoding defined in Annex B to encode
> UTF-8 strings?
> 
> Thanks,
> Sasha
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 6102 bytes
Desc: not available
URL: <https://lists.packetizer.com/pipermail/sg16-avd/attachments/20050622/1ce892cb/attachment-0006.bin>