H.248 and UTF-8 strings.

Fri Jun 24 09:53:41 EDT 2005

Hi Tom

Then I am flummoxed by the line:
     OCTET          =  %x00-FF
Which appears several lines before the 6.2 in the same RFC2234.

In addition Megaco already has the following definition:
 nonEscapeChar        = ( "\}" / %x01-7C / %x7E-FF )

How then these are encoded?

Regards,
Sasha

-----Original Message-----
From: Tom Taylor [mailto:taylor at nortel.com] 
Sent: Friday, June 24, 2005 9:30 AM
To: Christian Groves
Cc: Sasha Ruditsky; itu-sg16 at external.cisco.com;
Angelo.Contardi at ITALTEL.IT
Subject: Re: H.248 and UTF-8 strings.

I'd guess the restriction came from RFC 2234 (ABNF).  See paragraph 6.2.

Christian Groves wrote:
> Hello Sasha,
> 
> You're quite welcome to bring in a contribution to the July Meeting on

> this issue (addressed to Q.3- same procedure as any question), 
> although I hope that there will be some agreement on the solution on 
> the Megaco list. Those that can remember that far back know that I 
> wasn't really a proponent of the text encoding so I can't remember why

> these "extra chars" were excluded in the first place. I've added Tom 
> to see if he can remember. In terms of solution I would support adding

> this to VALUE only as indicated below.
> 
> Regards, Christian
> 
> Sasha Ruditsky wrote:
> 
>> Hi Christian
>>
>> As you correctly stated all the "extra chars" are the region between 
>> 0x80 and 0xff (0xf7 to be precise).
>> I am not aware about any special meaning of the characters from this 
>> region, so as a result I cannot understand why these "extra chars" 
>> need to be escaped or quoted in any way.
>>
>> The naive question is what is wrong with extending SafeChar to 
>> contain this region?
>> I.e.
>>    SafeChar             = DIGIT / ALPHA / "+" / "-" / "&" /
>>                           "!" / "_" / "/" / "\'" / "?" / "@" /
>>                           "^" / "`" / "~" / "*" / "$" / "\" /
>>                           "(" / ")" / "%" / "|" / "." / %x80-F7 Or, 
>> if for some reason the UTF-8 strings may be used only in VALUE,
>> then:
>>
>>   VALUE                = quotedString / 1*(SafeChar / %x80-F7)
>>   quotedString         = DQUOTE *(SafeChar / %x80-F7 / RestChar /
WSP)
>> DQUOTE
>>
>>
>>
>> And...
>>
>> Assuming that the way to fix this is found, can it be fixed in ver 3?
>> If yes, then what is the procedure?
>> Do you want me to bring relevant contribution to July meeting?
>>
>> Thanks,
>> Sasha
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Christian Groves [mailto:christian.groves at ericsson.com] Sent: 
>> Wednesday, June 22, 2005 8:16 PM
>> To: Sasha Ruditsky
>> Cc: itu-sg16 at external.cisco.com; Angelo.Contardi at italtel.it
>> Subject: Re: H.248 and UTF-8 strings.
>>
>> Hello Sasha,
>>
>> This problem was raised recently on the Megaco list. I've had some 
>> off-line discussion with Angelo (the person who raised the problem) 
>> and currently there's
>> 2 proposed solutions (I hope he would sent this to the Megaco list):
>>
>> 1) The simple one, Code the UTF-8 string in "Octect Mode". This is a 
>> BAD solution from the efficiency transmission point of view because 
>> of it "halve the TX band": to TX one UTF-8 char (max 4 ASCII chars) i

>> must TX max 2 x 4 = 8 ASCII chars.
>>
>> 2) The complicated one, allow the ABNF quoted form of VALUE to TX ALL

>> ASCII chars 0x01-0xFF, except 0x22, that should be ESCAPED with "\", 
>> as already done for ABNF Local and Remote Descriptor (see SDP). Note 
>> that '\0' (0x00) is NOT ALLOWED in this new quoted string form as in 
>> the present one, but this is not a problem because in UTF-8 the char
'\0'
>> (0x00) is the same as in ASCII (string terminator) and is NOT used to

>> code "non ASCII" UTF-8 chars, all those chars > 0x7F that require 
>> more than one ASCII chars to be encoded (from 2 to 4 ASCII chars). In

>> fact the "extra chars" needed to code an UTF-8 char are all above 
>> 0x7F (have the MSBit = 1).
>>
>> Regards, Christian
>>
>> Sasha Ruditsky wrote:
>>
>>
>>> Hi
>>>
>>> I'm trying to understand how H.248 supports UTF-8 string properties.
>>> According to H.248 the string property is encoded as UTF-8 string.
>>>
>>> UTF-8 encoding is defined by the following table:
>>>
>>> Scalar Value             1st Byte 2nd Byte 3rd Byte 4th Byte
>>> 00000000 0xxxxxxx         0xxxxxxx
>>> 00000yyy yyxxxxxx         110yyyyy 10xxxxxx
>>> zzzzyyyy yyxxxxxx         1110zzzz 10yyyyyy 10xxxxxx
>>> 000uuuuu zzzzyyyy yyxxxxxx     11110uuu 10uuzzzz 10yyyyyy 10xxxxxx
>>>
>>>
>>> I.e. all the character codes between x80 and xf7 need to be
supported.
>>>
>>> According to H.248 Annex B.2:
>>>
>>> The ABNF in this section uses the VALUE construct (or lists of VALUE
>>> constructs) to encode various package element values (properties, 
>>> signal parameters, etc.).
>>>
>>> The VALUE is defined as follows:
>>>
>>>  VALUE                = quotedString / 1*(SafeChar)
>>>  SafeChar             = DIGIT / ALPHA / "+" / "-" / "&" /
>>>                          "!" / "_" / "/" / "\'" / "?" / "@" /
>>>                          "^" / "`" / "~" / "*" / "$" / "\" /
>>>                          "(" / ")" / "%" / "|" / "."
>>>  ALPHA                = %x41-5A / %x61-7A ; A-Z / a-z
>>>  DIGIT                = %x30-39         ; 0-9
>>>  quotedString         = DQUOTE *(SafeChar / RestChar/ WSP) DQUOTE
>>>  RestChar             = ";" / "[" / "]" / "{" / "}" / ":" / "," /
"#"
>>
>>
>> /
>>
>>>                          "<" / ">" / "="
>>>  WSP                  = SP / HTAB ; white space
>>>  SP                   = %x20        ; space
>>>  HTAB                 = %x09        ; horizontal tab
>>>  DQUOTE               = %x22            ; " (Double Quote)
>>>
>>>
>>> So I believe this excludes the x80-xff characters.
>>>
>>> So the question is how to text encoding defined in Annex B to encode
>>> UTF-8 strings?
>>>
>>> Thanks,
>>> Sasha
>>>
>>
>>
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 4722 bytes
Desc: not available
URL: <https://lists.packetizer.com/pipermail/sg16-avd/attachments/20050624/2b96cc1e/attachment-0004.bin>