[odf-discuss] plug-in and ODF 1.2 (was: Miguel on OXML)
marbux
marbux at gmail.com
Fri Feb 2 15:16:24 EST 2007
On 2/2/07, Thomas Zander <zander at kde.org> wrote:
> On Thursday 01 February 2007 23:11, Peter Vandenabeele wrote:
> > On 2/1/07, marbux <marbux at gmail.com> wrote:
> > ...
> >
> > > I am told by the Foundation's developers that, once we have
> > > applications that are conformant with ODF v. 1.2, full
> > > interoperability is an achievable goal. It will be feasible to map
> > > every Office feature to ODF.
>
> I think the proper wording is "it will be feasible to store every office
> feature in an ODF file"
I actually meant what I said. :-) The key words being "achievable" and
"feasible." But I should have provided more detail. Let me have a
crack first at rewording your alternative statement and discussing
those changes first.
"It will be mandatory to preserve all MS Office metadata and dark
objects in an ODF file."
That's the full fidelity part. And it isn't just MS Office metadata
and dark objects, ODF developers' apps will be required to preserve
ODF metadata not supported in their own apps. I'll work with MS Word
as the example, since that is the app currently supported by the da
Vinci plug-in.
The mandatory preservation of foreign metadata would in itself be an
indescribably huge advance for adoption of ODF and apps that support
it. One example: governments operate under legal restrictions that
require the preservation of information. Once the proposed ODF 1.2
changes are implemented and da Vinci is released, it will be legal for
governments to migrate their records from .DOC to .ODT. They can keep
a few copies of MS Word/da Vinci around to view or print legacy
records in situations where "perfect" replication of the original is
required. But they can go forward working with .ODT for all new
documents, legally.
And for those who want to exchange documents with government, they
will need ODT apps but will not be required to use MS Word/da Vinci.
Result: non-Microsoft apps that support ODT can compete with MS Word
for government business without requiring that governments violate
their information preservation legal requirements; competition is
restored in the market for government word processors and the market
for folks who wish to exchange word processing documents with
government. Microsoft no longer has a lock on the government word
processing business created by those billions of legacy documents
stored in MS Word binary formats.
If you check the Fellowship's Precedent page,
<http://www.opendocumentfellowship.org/node/91>, you'll probably agree
that ODF adoption by government is much larger for new government
programs and programs whose record retention requirements are modest,
e.g., software for students in education programs. Those billions of
MS Office binaries ("BoBs") storing legacy documents are a huge
barrier to ODF adoption by government. Indeed, they are virtually the
only justification Microsoft has offered for developing its own flavor
of XML + binary blobs rather than supporting ODF. This is a huge
problem for developers of ODF apps. The Foundation's plug-ins and the
proposed ODF v. 1.2 metadata changes should destroy that barrier.
I'm on the run because I have a project that needs to be finished
today, so rather than address the feasibility of interoperability
myself, I will append a private email from Gary Edwards to Peter and
me that addresses the issue more eloquently than I can. I obtained
Gary's permission to publish it and Peter has already asked Gary to do
so, so I think no privacy issues are involved. But I think I misled
you by not making it more clear that I didn't mean that full
interoperability would happen because of the proposed ODF 1.2 changes,
but that the mechanisms would be in place in the spec to achieve full
interoperability. As Gary makes clear, there would be more work to be
done at the application level.
[more]
> I'm don't know why 1.2 is needed at all, really.
> The roundtrip is what is important here. And it is the same as when a KOffice
> application saves out a document with information that OpenOffice does not
> parse.
> Saving the information out to the file, even if the app did not _use_ that
> information is essential to a good roundtripping experience.
>
> In fact, the current ODF spec already allows you to do this. Its all in the
> implementation. So OOo fails to be nice in that respect. I hope its on the
> devs todo list.
It's desperately needed, in my opinion. If the issue were only the
fidelity in the round-tripping of files generated by KOffice, OOo, and
da Vinci-enabled Word, then I think you would be right. But we've
already got a flood of apps that read and write ODF and there's going
to be a huge mess batting documents back and forth among them if some
of them don't preserve metadata. E.g., if the less-featureful Writely
doesn't preserve metadata and the round-tripping is between KWord and
Writely, the conversation is going to be very lossy.
As it is, we've already got a mess to clean up because of such issues.
I'd like to see some discussion of the need for ODF 1.2 to also
require ODF version and declared subset metadata to be included in
conformant files, for several reasons: [i] the Foundation developers
have identified a need to declare an MS Office "interoperability
subset" of ODF because ODF is more featureful, primarily in the
richness of ODF support for more advanced page layout engines as
compared to Word's relatively primitive page layout engine that
prevents mapping of all ODF features to .DOC; [ii] there is no
absolute guarantee that the proposed interop extensions to ODF will be
able to handle everything that Microsoft might throw at ODF in the
future, given that non-interoperability remains one of Microsoft's
favorite marketing weapons; and therefore possibility of later
revisions to the interop features of ODF could conceivably be
necessary; and [iii] there may be some big advantages to declaring
interoperability subsets for specialized types of ODF apps, e.g.,
outliners. Outliner developers might agree on an ODF subset for
interoperability among outliners. There would be no difference from
the standpoint of the more featureful apps like Kword, but it would
provide outliner developers with an agreed set of tags they need to
support to be able to round-trip files among their apps with full
interop. However, they would need to be able to programmatically warn
their outliner users if they open ODT files that may include metadata
they do not support.
I'll stress that I'm the only one I know of who is pushing for
discussion of the potential need for ODF interop subsets for anything
but interop with MS Office. But I will add that both the proposed
Microsoft interop subset and related subsetting I raise are both
easily worked into the framework of my conceptual thinkpiece, "Toward
a solution of the file type association problem,"
<http://www.opendocumentfellowship.org/node/221>. For more of my
discussion of the need for an outliner ODF interop subset (and
additional tags for outliners) see my comment on this page.
<http://florianreuter.blogspot.com/2006/11/suggested-enhancement-for-opendocument.html>.
Here comes Gary's post to Peter and me:
=====================
Hi Peter,
Thanks for your interest, I'll try to answer your questions in line.
On 2/1/07, Peter Vandenabeele < peter at vandenabeele.com> wrote:
On 2/1/07, marbux < marbux at gmail.com> wrote:
...
> I am told by the Foundation's developers that, once we have
> applications that are conformant with ODF v. 1.2, full
> interoperability is an achievable goal. It will be feasible to map
> every Office feature to ODF.
<ge> I recently sent this reply to marbux:
The Da Vinci plugin sent to Massachusetts on August 10th, 2006, and
demonstrated to (and approved by) both the Handicap Council and the
division CIO's was our first ODF 1.2 implementation. It included the
five <interop eXtensions> that have been proposed to both the OASIS
ODF TC and the Metadata SC. Interoperability with OOo was determined
to be about 65% fidelity. But ODF interop between Da Vinci enabled
MSWord desktops was near perfect. The conversion of binary documents
to Da Vinci ODF blew them away. There was zero disruption to exsiting
MSOffice bound business processes or assistive technology add-ons.
We argued with Massachusetts that Da Vinci should not be installed
anywhere beyond the testing workgroup control units until ODF 1.2 is
approved by at least the OASIS ODF TC (mainline), and OOo releases an
ODF 1.2 ready beta. Otherwise, there is a serious probability of Da
Vinci forking ODF, and running away with the standard.
...................
I would be interested in understanding how that will work, without
at least containing or refering to certain elements of the EOOXML
spec in "extensions" or "meta data" of ODF 1.2. How will this work
while still allowing all applications (open source and proprietary)
to faithfully interprete the ODF 1.2 that contains this additional
information without relying on the EOOXML spec (particularly the
non-specified parts), or implementation details in the plug-in?
<ge> Da Vinci doesn't use EOOXML. The Da Vinci conversion process is
however very similar to how MSOffice applications convert their
in-memory-binary representations to EOOXML. In fact, our conversion
breakthrough was greatly assisted by the MSWord documentation of how
they convert both in-memory-binary representation (imbr) and legacy
binary documents to MSXML 9WordProcessingML and SpreadsheetML -
discontinued in the December 2006 releases of MSOffice 2007, VSTO 2005
and the Exchange/SharePoint/Groove Hub).
To understand Da Vinci, you've got to understand that there is an
internal MSOffice conversion process for MSXML <> imbr & legacy
binaries. The MSOffice conversion process for both MSXML
(discontinued but halfway documented) and EOOXML (undocumented) can
be expressed this way:
* Document loaded in (converted to imbr) or created as imbr (the
applications in-memory-binary representation) <>
* Save or Save As process triggers internal conversion to MS RTF
(imbr > msrtf) <>
* From the internal MS RTF staging, the final conversion is made
to MSXML or EOOXML
Note that loading in a EOOXML document will require a reverse
conversion of the above process. The only way the application can
"work" the document is having it converted to imbr.
Note also that MSWord uses MS RTF as the mid stream conversion
staging. It's like an intermediary structure similar to published
RTF, but loaded with secret relationships that only Microsoft (and now
Da Vinci :) understand. It turns out that Excel and PowerPoint alos
use a very similar intermediary structure for ALL converion processes.
It's not MS RTF, but it's so similar an adaptation to spreadsheet and
presentation needs that we still refer to it as "MS RTF". But it's
technically not RTF.
The key thing to note here is that an internal conversion process is
always present except with MS binary files. The MS Binary file is a
direct dump of imbr for any particular application version. If there
is version mismatch, there is also a conversion process involved.
Da Vinci installs natively, and intercepts the internal application
conversion process at the particular moment when the imbr <> MS RTF
intermediary is complete. With EOOXML, the intermediary structure is
mapped by MS application (or Compatibility Pack conversion components)
to EOOXML. Or MSXML. Or anything else the output is directed to
(RTF, HTML, Word97, etc.)
So Da Vinci triggers an internal conversion process, intercepts the
internally generated MS RTF intermediary structure, and maps to ODF!
Note that Da Vinci could alternatively be programmed to map to UOF,
Romanian XML, or even EOOXML.
When loading an ODF file into MSOffice, Da Vinci reverses the
conversion process, mapping ODF to MSRTF, which the application then
converts to imbr.
The ACME 376 proof of concept is in reality Da Vinci mapping to an XML
encoding of RTF instead of ODF. And this is not just any RTF :) But
you would only know that if you had studied carefully the internal MS
RTF intermediary structure.
Please keep in mind Peter that Da Vinci and the InfoSet Engine - API
are works in progress. Our progress with this work is directly
related to the funding available.
At least this wiki entry on the use case of Roundtrip improvement
suggests including extra meta data to preserve information from
"alien" formats.
http://wiki.oasis-open.org/office/Roundtrip_improvement
<ge> The OpenDocument Foundation members are major contributors to
both the ODF Formula and ODF Metadata Sub C's, where almost all of the
work for ODF 1.2 is taking place.
> Until that time, ODF files generated in Microsoft Word using the
> da Vinci plug-in will not be fully compatible with, e.g., OpenOffice.org
> and KOffice.
Which conflicts the essential _goal_ of the Open Standard: to be
interoperable between all applications and thus avoid vendor lock-in.
Of course, if by the summer (with ODF 1.2) we have a full solution,
that would be great. But I would prefer to better understand how
that solution will impact the ODF 1.2 standard to achieve this goal
and how that will keep ODF 1.2 interoperable with _all_ applications.
Looking forward to your clarification.
<ge> Yes, this is confusing to near everyone; how will ODF 1.2 improve
application interoperability and file format interoperability? The
key is two changes of great impact; applications MUST preserve named
value pairs, even those they don't understand or use, and, the new
metadata model provides us with unprecedented flexibility needed for
tagging and describing previously unspecified and not understood
binary objects.
The first ODF 1.2 requirement that applications must preserve elements
and attributes is critically important to any round trip environment.
Which i think is the essence of any file format interoperability.
This requires that application developers write
applications-as-routers-of-information instead of
information-end-points.
The second is that the ODF 1.2 metadata model introduces an amazing
level of flexibility, enabling us to use generic elements with
expanding attribute descriptions.
So how is it that the new metadata model helps the process of
converting MS specific but unspecified binary objects, processing
instructions and system dependencies to ODF useable XML encodings?
Well, it doesn't happen overnight. That's the first thing. But it
will happen over time because the mechanism of describing these dark
binary objects will in place with ODF 1.2. As it is, years of reverse
engineering has the conversion of these binaries at over a 85%
conversion - mapping to ODf ratio. That's pretty good, but it took
years of study. All the "big" stuff has been done, and what really
remains is the strange stuff. And, there is still the problem of
application feature mismatch - a problem much more difficult than
describing and mapping the unspecified dark objects found throughout
MS Binaries.
When Da Vinci picks up these dark objects, and realizes there is not
an existing, previously specified element/attribute set in ODF to map
to, two things must be done. The first is wrapping the binary object
in XML so that the originating MSOffice application can continue to
read the document with perfect fidelity. The second is the ODF 1.2
metadata flexibility of Da Vinci being able to fully describe that
object, grabbing one of the generic <interop> element tags, and then
fully describing whatever Da Vinci knows about that object based on
the context divined from the MS RTF intermediary structure the object
was found in.
With ODF 1.0 Da Vinci has the full capability of wrapping the dark
object in ODF compliant XML. What we lack is the ability to describe
the object so that other ODF application have a chance to give it
their best rendering shot. Think of ODF 1.0 named valued pairs as an
all or nothing proposition. Either the perfect element/attribute set
exist in ODF 1.0, and Da Vinci has something to map to, or, the tag
set doesn't exist leaving us with the only option of a <foreign
element> XML wrapper (pretty mush that same as how EOOXML handles
these same dark, unspecified binary objects :)
With ODF 1.2 we have our cake and can eat it too. We're going to wrap
the dark object for perfect fidelity on round trip. And we're going
to pop it into a parallel generic, semi descriptive <interop> element,
and then fully describe it using the new metadata model syntax. This
will give other ODF 1.2 ready applications a handle they can grab to
at least provide a partial rendering of the objects characteristics.
And if they are really good, the ODF 1.2 applications will pick up our
descriptive attributes, understand how to render and translate from
the metadata RDF/XML model, and perfect a very exacting rendering into
the user interface.
Over time, Da Vinci will drop all XML wrapping of these dark objects
as they move from being dark and unspecified to being fully understood
and specified. I say that because i think these objects have to be
studied and understood across huge volumes of documents before they
can be specified fully. But that's just me. Da Vinci developers
believe it will happen very quickly. Florian Reuter wrote the
extraordinary, near mystical Da Vinci algorithms for perfecting this
process, and many believe there is nothing beyond his reach. We shall
see :)
~ge~
More information about the odf-discuss
mailing list