-----Original Message-----
From: Contardi Angelo [mailto:Angelo.Contardi@italtel.it]
Sent: Friday, June 24, 2005 12:25 PM
To: Steve Cipolli
Subject: R: [Megaco] The ABNF coding of VALUE doesn't allow
to represent ageneric UTF-8 string/char
-----Messaggio originale-----
Da: Steve Cipolli [mailto:SCipolli@radvision.com]
Inviato: venerdì 24 giugno 2005 18.01
A: Contardi Angelo; Megaco IETF - Mail List (E-mail)
Cc: Christian Groves (E-mail)
Oggetto: RE: [Megaco] The ABNF coding of VALUE doesn't allow
to represent ageneric UTF-8 string/char
1. It is not necessary to quote the UTF-8 characters as you
define in your second solution. VALUE can be extended to
accept characters (quoted and unquoted) in the range 0x80-0xFF.
The problem is that VALUE don't allow to "carry" ALL ASCII
(0x00-0x7F) too, NEEDED to code UTF-8 characters < 0x80. So
the only "extension" 0x80-0xFF is NOT enough to allow the TX
of ALL UTF-8 with ABNF. The quotation is NEEDED becaue of if
you "scan" an input sequence and allow a VALUE as "any
sequence of ono or more 0x00-0xFF", all "tokens" identified
as VALUEs. The extension of quoted form of VALUE i propose
don't increment the number of the forms of VALUE, just EXEND
the actual Quoted form of VALUE
2. Providing a mechanism to allow double quotes (") in a
(double) quotedstring is essentially a separate issue, since
addition of UTF-8 chars does not motivate the need for this
mechanism nor complicate its addition.
See point 1.
-----Original Message-----
From: megaco-bounces@ietf.org
[mailto:megaco-bounces@ietf.org] On Behalf Of Contardi Angelo
Sent: Friday, June 24, 2005 11:34 AM
To: Megaco IETF - Mail List (E-mail)
Cc: Christian Groves (E-mail)
Subject: [Megaco] The ABNF coding of VALUE doesn't allow to
represent ageneric UTF-8 string/char
Hello,
from my previous private discussion with Christian Groves,
i have deduce this:
The ABNF coding of VALUE doesn't allow to represent a
generic UTF-8 string or char (RFC 3629) because of:
1) The FIRST byte of an UTF-8 char may be ANY ASCII char in
the range 0x00-0x7E
and, for instance, the ASCII 0x00-0x07 and 0x22 (") are
not allowed in ANY
ABNF form of VALUE.
2) Furthermore, UTF-8 chars "greater than 0x7E", need "ASCII
chars" (to better
say, Octets) in the range 0x80-0xFF (not ALL), not allowed
in ANY ABNF form
of VALUE.
So, to "correct" this problem, i can suggest two possible solutions:
1) The simple one, code the UTF-8 string/char in "Octect
Mode", as described in
ANNEX B.3. This is a BAD solution from the efficiency
transmission point of
view because of it "halve the TX band": to TX one UTF-8
char (max 4 Octet)
i must TX max 2 x 4 = 8 Octet (ASCII chars).
2) The difficult one, allow the ABNF quoted form of VALUE to
TX ALL ASCII chars
0x01-0xFF (the range 0x80-0xFF are more properly named
OCTET in RFC2234 ),
except 0x22 ("), that should be ESCAPED with "\", as
already done for ABNF
Local and Remote Descriptor (see SDP). Note that '\0'
(0x00) is NOT ALLOWED
in this new quoted string form as in the present one, but
it's not a problem
because in UTF-8 the char '\0' (0x00) is same as in ASCII
(string terminator)
and is NOT used to code "non ASCII" UTF-8 chars, all
those chars > 0x7F that
require more than one ASCII chars to be encoded (from 2 to
4 ASCII chars). In
fact the "extra chars" needed to code an UTF-8 char are
all above 0x7F.
While the 1st mode doesn't require any modification of
ABNF syntax (but is it applicable in any case ?), the 2nd one
require this ABNF modification:
quotedString = DQUOTE *(quotedChar) DQUOTE
; %x22 (") is allowed just if "escaped" with "\"
quotedChar = ( %x01-21 / "\" DQUOTE / %x23-FF )
In a parser implementation, i think, this is not a
"terrible complication" and can be solved in the same way of
"octetString" of Local and RemoteDescriptor. It require also
to implement an ESCAPE "strip(rx)/padding(tx)" mechanism, as
already required for SDP.
P.S.: I suppose NO ONE need to send the "string terminator"
'\0' (0x00) in an
UTF-8 string or char. If my assertion is false, in
the "solution 2)" the
%x00 should also be "escaped". Viceversa, the
"solution 1)" can already
"transport" %x00 octet.
Best regards
o o o o o o o . . . ___________________________________
o _____ || Angelo Contardi |
.][__n_n_|DD[ ====_____ | angelo.contardi@italtel.it |
>(________|__|_[_________]_|________________________________|
_/oo OOOOO oo` ooo ooo 'o!o!o o!o!o`
_______________________________________________
Megaco mailing list
Megaco@ietf.org
https://www1.ietf.org/mailman/listinfo/megaco