H.248 and UTF-8 strings.
Christian Groves
christian.groves at ERICSSON.COM
Thu Jun 23 20:30:22 EDT 2005
Hello Sasha,
You're quite welcome to bring in a contribution to the July Meeting on
this
issue (addressed to Q.3- same procedure as any question), although I hope
that
there will be some agreement on the solution on the Megaco list. Those
that can
remember that far back know that I wasn't really a proponent of the text
encoding so I can't remember why these "extra chars" were excluded in the
first
place. I've added Tom to see if he can remember. In terms of solution I
would
support adding this to VALUE only as indicated below.
Regards, Christian
Sasha Ruditsky wrote:
> Hi Christian
>
> As you correctly stated all the "extra chars" are the region between
> 0x80 and 0xff (0xf7 to be precise).
> I am not aware about any special meaning of the characters from this
> region, so as a result I cannot understand why these "extra chars" need
> to be escaped or quoted in any way.
>
> The naive question is what is wrong with extending SafeChar to contain
> this region?
> I.e.
> SafeChar = DIGIT / ALPHA / "+" / "-" / "&" /
> "!" / "_" / "/" / "\'" / "?" / "@" /
> "^" / "`" / "~" / "*" / "$" / "\" /
> "(" / ")" / "%" / "|" / "." / %x80-F7
>
> Or, if for some reason the UTF-8 strings may be used only in VALUE,
> then:
>
> VALUE = quotedString / 1*(SafeChar / %x80-F7)
> quotedString = DQUOTE *(SafeChar / %x80-F7 / RestChar / WSP)
> DQUOTE
>
>
>
> And...
>
> Assuming that the way to fix this is found, can it be fixed in ver 3?
> If yes, then what is the procedure?
> Do you want me to bring relevant contribution to July meeting?
>
> Thanks,
> Sasha
>
>
>
>
>
> -----Original Message-----
> From: Christian Groves [mailto:christian.groves at ericsson.com]
> Sent: Wednesday, June 22, 2005 8:16 PM
> To: Sasha Ruditsky
> Cc: itu-sg16 at external.cisco.com; Angelo.Contardi at italtel.it
> Subject: Re: H.248 and UTF-8 strings.
>
> Hello Sasha,
>
> This problem was raised recently on the Megaco list. I've had some
> off-line discussion with Angelo (the person who raised the problem) and
> currently there's
> 2 proposed solutions (I hope he would sent this to the Megaco list):
>
> 1) The simple one, Code the UTF-8 string in "Octect Mode". This is a BAD
> solution from the efficiency transmission point of view because of it
> "halve the TX band": to TX one UTF-8 char (max 4 ASCII chars) i must TX
> max 2 x 4 = 8 ASCII chars.
>
> 2) The complicated one, allow the ABNF quoted form of VALUE to TX ALL
> ASCII chars 0x01-0xFF, except 0x22, that should be ESCAPED with "\", as
> already done for ABNF Local and Remote Descriptor (see SDP). Note that
> '\0' (0x00) is NOT ALLOWED in this new quoted string form as in the
> present one, but this is not a problem because in UTF-8 the char '\0'
> (0x00) is the same as in ASCII (string terminator) and is NOT used to
> code "non ASCII" UTF-8 chars, all those chars > 0x7F that require more
> than one ASCII chars to be encoded (from 2 to 4 ASCII chars). In fact
> the "extra chars" needed to code an UTF-8 char are all above 0x7F (have
> the MSBit = 1).
>
> Regards, Christian
>
> Sasha Ruditsky wrote:
>
>
>>Hi
>>
>>I'm trying to understand how H.248 supports UTF-8 string properties.
>>According to H.248 the string property is encoded as UTF-8 string.
>>
>>UTF-8 encoding is defined by the following table:
>>
>>Scalar Value 1st Byte 2nd Byte 3rd Byte 4th Byte
>>00000000 0xxxxxxx 0xxxxxxx
>>00000yyy yyxxxxxx 110yyyyy 10xxxxxx
>>zzzzyyyy yyxxxxxx 1110zzzz 10yyyyyy 10xxxxxx
>>000uuuuu zzzzyyyy yyxxxxxx 11110uuu 10uuzzzz 10yyyyyy 10xxxxxx
>>
>>
>>I.e. all the character codes between x80 and xf7 need to be supported.
>>
>>According to H.248 Annex B.2:
>>
>>The ABNF in this section uses the VALUE construct (or lists of VALUE
>>constructs) to encode various package element values (properties,
>>signal parameters, etc.).
>>
>>The VALUE is defined as follows:
>>
>> VALUE = quotedString / 1*(SafeChar)
>> SafeChar = DIGIT / ALPHA / "+" / "-" / "&" /
>> "!" / "_" / "/" / "\'" / "?" / "@" /
>> "^" / "`" / "~" / "*" / "$" / "\" /
>> "(" / ")" / "%" / "|" / "."
>> ALPHA = %x41-5A / %x61-7A ; A-Z / a-z
>> DIGIT = %x30-39 ; 0-9
>> quotedString = DQUOTE *(SafeChar / RestChar/ WSP) DQUOTE
>> RestChar = ";" / "[" / "]" / "{" / "}" / ":" / "," / "#"
>
> /
>
>> "<" / ">" / "="
>> WSP = SP / HTAB ; white space
>> SP = %x20 ; space
>> HTAB = %x09 ; horizontal tab
>> DQUOTE = %x22 ; " (Double Quote)
>>
>>
>>So I believe this excludes the x80-xff characters.
>>
>>So the question is how to text encoding defined in Annex B to encode
>>UTF-8 strings?
>>
>>Thanks,
>>Sasha
>>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 9182 bytes
Desc: not available
URL: <https://lists.packetizer.com/pipermail/sg16-avd/attachments/20050623/634c6774/attachment-0006.bin>
More information about the sg16-avd
mailing list