[odf-discuss] ODF and UTF-8/16/32-HEX/DEC/DECHTML
zander at kde.org
Wed Feb 21 13:17:44 EST 2007
On Wednesday 21 February 2007 18:27, Damon Anderson wrote:
> Where I take issue is with using < and >. HTML supports UTF-8
> Decimal fully, so should XML. < and > should be XML extensions, not
Hehe, lots of confusion here.
utf-8 is a way of writing out to disk unicode characters. Just like utf-16
and other encodings. I'm pretty sure that has nothing to do with your
I'm assuming your utf-8 usage here actually means unicode.
In XML (and HTML) the following rules are made:
requesting the child-text of node 'foo' leads to the string "bar".
The other way around:
String: "Some words" becomes:
String: "Fokke & sukke" becomes:
<foo>Fokke & sukke</foo>
String: "Bold is <b> in html" becomes:
<foo>Bold is <b> in html</foo>
Hope that clears some things up.
Further you said;
> Given that 90% of the world's
> languages are non-ASCII this standard seems to be UTF and UTF-8 probably
> isn't enough (given that chinese and Japanese are already pushing into
> requirements of UTF-32) it seems like UTF-16HEX should be the defacto
This if false; please read up on utf-8. Its a standard that can encode all of
unicode 5 due to the fact that its a variable length standard.
For example; http://en.wikipedia.org/wiki/Utf8
I know; I store japanese, korean, hebrew and various other languages in my
KWord documents and they are all UTF-8.
Its a difficult domain, encodings and unicode and all that stuff. But I can
assure you that ODF is fully capable for all spoken languages and various
dead languags as well.
I can't advice anything other then that you dive into the documentation to
learn how to get the best out of them.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Size: 189 bytes
Desc: not available
Url : http://lists.opendocumentfellowship.com/pipermail/odf-discuss/attachments/20070221/3fdb8d8b/attachment-0001.pgp
More information about the odf-discuss