H.248 and UTF-8 strings.

Thu Jun 23 20:30:22 EDT 2005

Hello Sasha,

You're quite welcome to bring in a contribution to the July Meeting on
this 
issue (addressed to Q.3- same procedure as any question), although I hope
that 
there will be some agreement on the solution on the Megaco list. Those
that can 
remember that far back know that I wasn't really a proponent of the text 
encoding so I can't remember why these "extra chars" were excluded in the
first 
place. I've added Tom to see if he can remember. In terms of solution I
would 
support adding this to VALUE only as indicated below.

Regards, Christian

Sasha Ruditsky wrote:

> Hi Christian
> 
> As you correctly stated all the "extra chars" are the region between
> 0x80 and 0xff (0xf7 to be precise).
> I am not aware about any special meaning of the characters from this
> region, so as a result I cannot understand why these "extra chars" need
> to be escaped or quoted in any way.
> 
> The naive question is what is wrong with extending SafeChar to contain
> this region?
> I.e.
>    SafeChar             = DIGIT / ALPHA / "+" / "-" / "&" /
>                           "!" / "_" / "/" / "\'" / "?" / "@" /
>                           "^" / "`" / "~" / "*" / "$" / "\" /
>                           "(" / ")" / "%" / "|" / "." / %x80-F7 
> 
> Or, if for some reason the UTF-8 strings may be used only in VALUE,
> then:
> 
>   VALUE                = quotedString / 1*(SafeChar / %x80-F7)
>   quotedString         = DQUOTE *(SafeChar / %x80-F7 / RestChar / WSP)
> DQUOTE
> 
> 
> 
> And...
> 
> Assuming that the way to fix this is found, can it be fixed in ver 3?
> If yes, then what is the procedure?
> Do you want me to bring relevant contribution to July meeting?
> 
> Thanks,
> Sasha
> 
> 
> 
> 
> 
> -----Original Message-----
> From: Christian Groves [mailto:christian.groves at ericsson.com] 
> Sent: Wednesday, June 22, 2005 8:16 PM
> To: Sasha Ruditsky
> Cc: itu-sg16 at external.cisco.com; Angelo.Contardi at italtel.it
> Subject: Re: H.248 and UTF-8 strings.
> 
> Hello Sasha,
> 
> This problem was raised recently on the Megaco list. I've had some
> off-line discussion with Angelo (the person who raised the problem) and
> currently there's
> 2 proposed solutions (I hope he would sent this to the Megaco list):
> 
> 1) The simple one, Code the UTF-8 string in "Octect Mode". This is a BAD
> solution from the efficiency transmission point of view because of it
> "halve the TX band": to TX one UTF-8 char (max 4 ASCII chars) i must TX
> max 2 x 4 = 8 ASCII chars.
> 
> 2) The complicated one, allow the ABNF quoted form of VALUE to TX ALL
> ASCII chars 0x01-0xFF, except 0x22, that should be ESCAPED with "\", as
> already done for ABNF Local and Remote Descriptor (see SDP). Note that
> '\0' (0x00) is NOT ALLOWED in this new quoted string form as in the
> present one, but this is not a problem because in UTF-8 the char '\0'
> (0x00) is the same as in ASCII (string terminator) and is NOT used to
> code "non ASCII" UTF-8 chars, all those chars > 0x7F that require more
> than one ASCII chars to be encoded (from 2 to 4 ASCII chars). In fact
> the "extra chars" needed to code an UTF-8 char are all above 0x7F (have
> the MSBit = 1).
> 
> Regards, Christian
> 
> Sasha Ruditsky wrote:
> 
> 
>>Hi
>>
>>I'm trying to understand how H.248 supports UTF-8 string properties.
>>According to H.248 the string property is encoded as UTF-8 string.
>>
>>UTF-8 encoding is defined by the following table:
>>
>>Scalar Value 			1st Byte 2nd Byte 3rd Byte 4th Byte
>>00000000 0xxxxxxx 		0xxxxxxx
>>00000yyy yyxxxxxx 		110yyyyy 10xxxxxx
>>zzzzyyyy yyxxxxxx 		1110zzzz 10yyyyyy 10xxxxxx
>>000uuuuu zzzzyyyy yyxxxxxx 	11110uuu 10uuzzzz 10yyyyyy 10xxxxxx
>>
>>
>>I.e. all the character codes between x80 and xf7 need to be supported.
>>
>>According to H.248 Annex B.2:
>>
>>The ABNF in this section uses the VALUE construct (or lists of VALUE
>>constructs) to encode various package element values (properties, 
>>signal parameters, etc.).
>>
>>The VALUE is defined as follows:
>>
>>  VALUE                = quotedString / 1*(SafeChar)
>>  SafeChar             = DIGIT / ALPHA / "+" / "-" / "&" /
>>                          "!" / "_" / "/" / "\'" / "?" / "@" /
>>                          "^" / "`" / "~" / "*" / "$" / "\" /
>>                          "(" / ")" / "%" / "|" / "."
>>  ALPHA                = %x41-5A / %x61-7A ; A-Z / a-z
>>  DIGIT                = %x30-39         ; 0-9
>>  quotedString         = DQUOTE *(SafeChar / RestChar/ WSP) DQUOTE
>>  RestChar             = ";" / "[" / "]" / "{" / "}" / ":" / "," / "#"
> 
> /
> 
>>                          "<" / ">" / "="
>>  WSP                  = SP / HTAB ; white space
>>  SP                   = %x20        ; space
>>  HTAB                 = %x09        ; horizontal tab
>>  DQUOTE               = %x22            ; " (Double Quote)
>>
>>
>>So I believe this excludes the x80-xff characters.
>>
>>So the question is how to text encoding defined in Annex B to encode
>>UTF-8 strings?
>>
>>Thanks,
>>Sasha
>>
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 9182 bytes
Desc: not available
URL: <https://lists.packetizer.com/pipermail/sg16-avd/attachments/20050623/634c6774/attachment-0006.bin>