[odf-discuss] Re: ODF and UTF-8/16/32-HEX/DEC/DECHTML
Daniel Carrera
daniel.carrera at zmsl.com
Wed Feb 21 12:45:42 EST 2007
On Thu, 2007-22-02 at 00:27 +0700, Damon Anderson wrote:
> Where I take issue is with using < and >. HTML supports UTF-8
> Decimal fully, so should XML.
XML supports everything HTML does. Indeed, XHTML is an XML-compliant
implementation of HTML. XML can be written in any encoding, including
UTF-8. ODF files can be written in any encoding, including UTF-8. ODF
files are usually written in UTF-8.
I have no idea what you mean by "Decimal" or what this has to do with
XML entities.
> < and > should be XML extensions, not defaults.
How do you propose denoting the < and > characters without ambiguity?
What benefit is there to typing <a><</a> instead of <a><</a>? It just
creates more work for parsers and human beings (e.g. "was that a typo or
is that < really supposed to be there?") for no benefit. And this has
nothing to do with UTF-8. These characters are plain-old ASCII.
> If our standards don't interact with each what good are they?
What does it fail to interact with? Every XML parser understands those
entities. I'm sorry but I don't know what you are on about. I'm
surprised that anyone has an issue with XML using the < entity.
> I understand that at the root ODF is an XML standard, but it's long term
> goal is far different then that of XML, e.g. it is to provide the ability
> to create a single document standard.
Sigh... XML is designed for making interoperable file formats. ODF is
intended to be an interoperable file format. ODF is specified in XML.
> Given that 90% of the world's languages are non-ASCII
What does ASCII vs non-ASCII have to do with the entities < > and
& ?
> this standard seems to be UTF and UTF-8 probably isn't enough..
> it seems like UTF-16HEX should be the defacto standard.
* Give me one character that can be expressed in UTF-16 but not UTF-8.
Look, these are just implementations of Unicode. Anything in Unicode can
be expressed in both UTF-8 and UTF-16.
* ODF and XML can use any encoding. UTF-8 or UTF-16.
* This has nothing to do with entities.
> I live, work and develop in Asia, and the lack of proper UTF support from
> databases to OOo has been stageringly difficult to overcome.
Fine, but what does that have to do with < ?
> I can tell you right now
> that handling little annoying things like XML using a non-standard
> denomination of <
Sorry, but < has been standard for decades. It's not something that
was invented last year.
> I understand that UTF is complex (heck it's 9 standards not 1), but it is
> the only truly viable solution available to digital encode the diversity
> of human language, and it is an ISO standard!
But ODF files are written in UTF!
> In other words, since the goal of ODF is Documents, and not XML (XML has
> other goals)
XML is designed for making file formats. ODF is a file format.
> shouldn't the overriding standard for character encoding be
> related to documents (Unicode) and not XML?
I'm bewildered. XML has no conflict with Unicode. XML files can be
written in any encoding, and all the ODF files I've seen are UTF.
> (who wrote this poorly internationalized XML standard anyway? -jk)
It's poorly internationalized because it has 3 entities for standard
ASCII characters?
Btw, you are free to use < > and & if you like those better.
Daniel.
-- Catalan is essentially bad Spanish mixed with even worse French.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 191 bytes
Desc: This is a digitally signed message part
Url : http://lists.opendocumentfellowship.com/pipermail/odf-discuss/attachments/20070221/0b9c668b/attachment-0001.pgp
More information about the odf-discuss
mailing list