H.248 and UTF-8 strings.
Sasha Ruditsky
sasha at RADVISION.COM
Fri Jun 24 09:53:41 EDT 2005
Hi Tom
Then I am flummoxed by the line:
OCTET = %x00-FF
Which appears several lines before the 6.2 in the same RFC2234.
In addition Megaco already has the following definition:
nonEscapeChar = ( "\}" / %x01-7C / %x7E-FF )
How then these are encoded?
Regards,
Sasha
-----Original Message-----
From: Tom Taylor [mailto:taylor at nortel.com]
Sent: Friday, June 24, 2005 9:30 AM
To: Christian Groves
Cc: Sasha Ruditsky; itu-sg16 at external.cisco.com;
Angelo.Contardi at ITALTEL.IT
Subject: Re: H.248 and UTF-8 strings.
I'd guess the restriction came from RFC 2234 (ABNF). See paragraph 6.2.
Christian Groves wrote:
> Hello Sasha,
>
> You're quite welcome to bring in a contribution to the July Meeting on
> this issue (addressed to Q.3- same procedure as any question),
> although I hope that there will be some agreement on the solution on
> the Megaco list. Those that can remember that far back know that I
> wasn't really a proponent of the text encoding so I can't remember why
> these "extra chars" were excluded in the first place. I've added Tom
> to see if he can remember. In terms of solution I would support adding
> this to VALUE only as indicated below.
>
> Regards, Christian
>
> Sasha Ruditsky wrote:
>
>> Hi Christian
>>
>> As you correctly stated all the "extra chars" are the region between
>> 0x80 and 0xff (0xf7 to be precise).
>> I am not aware about any special meaning of the characters from this
>> region, so as a result I cannot understand why these "extra chars"
>> need to be escaped or quoted in any way.
>>
>> The naive question is what is wrong with extending SafeChar to
>> contain this region?
>> I.e.
>> SafeChar = DIGIT / ALPHA / "+" / "-" / "&" /
>> "!" / "_" / "/" / "\'" / "?" / "@" /
>> "^" / "`" / "~" / "*" / "$" / "\" /
>> "(" / ")" / "%" / "|" / "." / %x80-F7 Or,
>> if for some reason the UTF-8 strings may be used only in VALUE,
>> then:
>>
>> VALUE = quotedString / 1*(SafeChar / %x80-F7)
>> quotedString = DQUOTE *(SafeChar / %x80-F7 / RestChar /
WSP)
>> DQUOTE
>>
>>
>>
>> And...
>>
>> Assuming that the way to fix this is found, can it be fixed in ver 3?
>> If yes, then what is the procedure?
>> Do you want me to bring relevant contribution to July meeting?
>>
>> Thanks,
>> Sasha
>>
>>
>>
>>
>>
>> -----Original Message-----
>> From: Christian Groves [mailto:christian.groves at ericsson.com] Sent:
>> Wednesday, June 22, 2005 8:16 PM
>> To: Sasha Ruditsky
>> Cc: itu-sg16 at external.cisco.com; Angelo.Contardi at italtel.it
>> Subject: Re: H.248 and UTF-8 strings.
>>
>> Hello Sasha,
>>
>> This problem was raised recently on the Megaco list. I've had some
>> off-line discussion with Angelo (the person who raised the problem)
>> and currently there's
>> 2 proposed solutions (I hope he would sent this to the Megaco list):
>>
>> 1) The simple one, Code the UTF-8 string in "Octect Mode". This is a
>> BAD solution from the efficiency transmission point of view because
>> of it "halve the TX band": to TX one UTF-8 char (max 4 ASCII chars) i
>> must TX max 2 x 4 = 8 ASCII chars.
>>
>> 2) The complicated one, allow the ABNF quoted form of VALUE to TX ALL
>> ASCII chars 0x01-0xFF, except 0x22, that should be ESCAPED with "\",
>> as already done for ABNF Local and Remote Descriptor (see SDP). Note
>> that '\0' (0x00) is NOT ALLOWED in this new quoted string form as in
>> the present one, but this is not a problem because in UTF-8 the char
'\0'
>> (0x00) is the same as in ASCII (string terminator) and is NOT used to
>> code "non ASCII" UTF-8 chars, all those chars > 0x7F that require
>> more than one ASCII chars to be encoded (from 2 to 4 ASCII chars). In
>> fact the "extra chars" needed to code an UTF-8 char are all above
>> 0x7F (have the MSBit = 1).
>>
>> Regards, Christian
>>
>> Sasha Ruditsky wrote:
>>
>>
>>> Hi
>>>
>>> I'm trying to understand how H.248 supports UTF-8 string properties.
>>> According to H.248 the string property is encoded as UTF-8 string.
>>>
>>> UTF-8 encoding is defined by the following table:
>>>
>>> Scalar Value 1st Byte 2nd Byte 3rd Byte 4th Byte
>>> 00000000 0xxxxxxx 0xxxxxxx
>>> 00000yyy yyxxxxxx 110yyyyy 10xxxxxx
>>> zzzzyyyy yyxxxxxx 1110zzzz 10yyyyyy 10xxxxxx
>>> 000uuuuu zzzzyyyy yyxxxxxx 11110uuu 10uuzzzz 10yyyyyy 10xxxxxx
>>>
>>>
>>> I.e. all the character codes between x80 and xf7 need to be
supported.
>>>
>>> According to H.248 Annex B.2:
>>>
>>> The ABNF in this section uses the VALUE construct (or lists of VALUE
>>> constructs) to encode various package element values (properties,
>>> signal parameters, etc.).
>>>
>>> The VALUE is defined as follows:
>>>
>>> VALUE = quotedString / 1*(SafeChar)
>>> SafeChar = DIGIT / ALPHA / "+" / "-" / "&" /
>>> "!" / "_" / "/" / "\'" / "?" / "@" /
>>> "^" / "`" / "~" / "*" / "$" / "\" /
>>> "(" / ")" / "%" / "|" / "."
>>> ALPHA = %x41-5A / %x61-7A ; A-Z / a-z
>>> DIGIT = %x30-39 ; 0-9
>>> quotedString = DQUOTE *(SafeChar / RestChar/ WSP) DQUOTE
>>> RestChar = ";" / "[" / "]" / "{" / "}" / ":" / "," /
"#"
>>
>>
>> /
>>
>>> "<" / ">" / "="
>>> WSP = SP / HTAB ; white space
>>> SP = %x20 ; space
>>> HTAB = %x09 ; horizontal tab
>>> DQUOTE = %x22 ; " (Double Quote)
>>>
>>>
>>> So I believe this excludes the x80-xff characters.
>>>
>>> So the question is how to text encoding defined in Annex B to encode
>>> UTF-8 strings?
>>>
>>> Thanks,
>>> Sasha
>>>
>>
>>
>
>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: winmail.dat
Type: application/ms-tnef
Size: 4722 bytes
Desc: not available
URL: <https://lists.packetizer.com/pipermail/sg16-avd/attachments/20050624/2b96cc1e/attachment-0004.bin>
More information about the sg16-avd
mailing list