[odf-discuss] A story of TIFF

marbux marbux at gmail.com
Fri Feb 9 21:27:29 EST 2007


Sorry for my short temper. I now see more of where our
miscommunication comes from.

On 2/4/07, Daniel Carrera <daniel.carrera at zmsl.com> wrote:
> On Sun, 2007-02-04 at 16:28 -0800, marbux wrote:
> > > How is that possible considering that ODF 1.2 doesn't exist yet, much
> > > less it's implemented in OOo. Are we using the same definition of
> > > "interoperability"?
> >
> > I was clear in what I said. You're arguing with an implicit
> > mischaracterization of what I said, and that's a straw horse argument.
>
> I am not trying to make a strawman. Where is the strawman? You said:
>
> "My understanding is that the current version using the proposed ODF 1.2
> changes already achieves near-perfect interop in both directions"
>
> I'm saying that I don't see how this is possible, since ODF 1.2 is not
> implemented elsewhere. Interoperability requires more than one
> application. You cannot argue interoperability until there is at least
> one other application with which you can interoperate.
>
I'm sorry. I thought it was understood that the Foundation is working
with their own build of OOo that implements the proposed changes in
ODF 1.2. That is the main reason why they are not releasing their
plugin before ODF 1.2 is firmed up and implemented in OOo. They do not
want to cause a fork in OOo or in ODF.

> At no point have I suggested that I don't understand that the ODF plugin
> exports binary blobs as dark matter and plans to put a flag on them. You
> seem to think that this is relevant, but this what I said:
>
> "So, in response to ODF 1.2 and interoperability: (1) I am not aware of
> any planned feature in ODF 1.2 that will remove the binary blobs that
> the Foundation plugin currently uses."
>
> That statement remains. I am aware that you plan to export unknown
> binary blobs and flagging them. This, however, is not a feature that
> makes the unknown binary blobs known. It merely marks them as unknown.
>
It isn't me that is planning to do that; it's the Foundation
developers. I am genuinely excited by their approach. But their
approach does more than flag the dark objects; it also describes them
so that apps can achieve the best interop feasible as more is learned
about the dark objects.

> > Nope. I was clear in saying that the plugin depends on features not
> > presently supported in ODF.
>
> Which conflicts with your claim "the current version using the proposed
> ODF 1.2 changes already achieves near-perfect interop in both
> directions".
>
I suspect you hadn't realized that the Foundation is working with
their own build of OOo that implements the proposed changes for ODF
1.2. Does that fact resolve many of our differences including this
one?

> > If the Clever Age approach was
> > actually equal to or better than the Foundation approach, I doubt that
> > any of the Foundation folk including Florian would still be working on
> > it.
>
> You are using one tool that Clever Age cannot use: extensions. If Clever
> Age started adding non-ODF tags to their files you and the rest of the
> world would be crying bloody murder. "conversion" would really be a lot
> easier if you allow yourself to use extensions. For example, you could
> just grab an EOOXML file, put <office:document> tags around it and
> persto! it is a valid ODF file as defined in section 1.5. But of course,
> we don't _want_ Clever Age to take that approach, even if it means "full
> fidelity".
>
I think you overlook that the "foreign element" tags are limited to
foreign metadata. As I understand ODF 1.0 section 1.5 it would be
improper to use the foreign element tags for enclosing blobs that are
not metadata; i.e., wrapping an entire document in foreign element
tags would be a non-conformant use. I could be wrong about that, but
it is a point I have on my checklist for the draft 1.2 spec. I am
concerned that Microsoft might eventually implement ODF with binary
blobs in the foreign element tags if the conformance section is not
rewritten to handle that issue.

I think it important to distinguish when we are referring to EOOXML
and to MOOXML. They are not the same. The Microsoft Office/business
line stack are the only programs that can round-trip between the two
with full fidelity and interop. EOOXML is a subset of MOOXML, although
it is too early to tell whether EOOXML adds features not included in
MOOXML. It is also too early to say whether MOOXML uses XML tags that
are not specified in EOOXML or whether EOOXML captures all of the
blobs written by the Office/business line stack. Given MIcrosoft's
track record, however, it would not be unreasonable to strongly
suspect that MOOXML includes blobs that are not captured by EOOXML.
For example, I searched the Ecma 376 specification for references to
Sharepoint but found only one reference to it, in the PowerPoint
section. Yet if you run through the documents that Groklaw published
yesterday from the Combs v. Microsoft litigation, you learn about a
very large number of undocumented APIs shared across the
Vista/Office/Sharepoint/IE7/Exchange stack that look to me like they
should be reflected by metadata in the Office binary formats

Given that Microsoft clearly wants only 1-way fidelity and interop in
Ecma 376, inbound to the Microsoft business line stack, there is every
reason to suspect that EOOXML is the modern, mostly XML equivalent of
RTF, a file format that can be used to send data to MS Word but can
not be used to express the full range of Word's functionality (except
by using the modified RTF native file support API in Word). The
careful reader of the Ecma 376 specification will notice that for all
the talk about enabling interoperability and high fidelity via Ecma
376, Microsoft/Ecma carefully avoided saying whose applications would
have the benefit of all that interop and fidelity. It's rather clear
that it is Microsoft and Microsoft only.

I can not yet point you to specific blobs in MOOXML that are not
present in EOOXML. But there is every reason to suspect that is the
situation. And the Foundation developers are emphatic that quite a few
of the blobs in the Office 2007 DOC binary format are business line
stack metadata, not just Office metadata.

I know I have not articulated this issue clearly in the past and I
apologize for that. But the bottom line is that the chances are far
better that all of the relevant metadata will be captured by accessing
the IMBR directly than by accessing what is preserved in EOOXML.
EOOXML is not designed to be a full participant in the Microsoft
business line stack. It is an import format, not a round-tripping
format. Microsoft wants you to buy their software; not to buy the
software of other vendors that implement EOOXML. From Microsoft's
standpoint, EOOXML is far more a one-way communications protocol than
it is a set of file formats.

The Foundation folk tell me that Microsoft has implemented something
very much like the ODF section 1.5 metadata preservation stuff in IMBR
<> MOOXML. I've  spent some time looking, but have not yet found
anything equivalent in EOOXML. That isn't to say that it isn't there;
I have no goal of reading the entire spec anytime real soon. :-)  But
if that feature isn't there in EOOXML, i.e., EOOXML does not require
preservation of all metadata whether binary or XML, to me that would
be a compelling case against using EOOXML as an intermediary between
ODF and Office IMBR.

Note: Awhile back someone told me that EOOXML does have such a
requirement and I believe I repeated that information somewhere. But
digging in, I haven't yet been able to confirm the information. So if
I made that statement on this list or in a private communication to
any of you, I apologize if I didn't properly caveat the information.


> > Second, the allowance for wrapping dark objects in
> > EOOXML tags testifies that MOOXML is not entirely XML but also
> > includes proprietary binary blobs.
>
> Could this same argument not be made for ODF then? "the allowance for
> wrapping dark objects in ODF tags testifies that ODF is not entirely XML
> but also includes proprietary binary blobs". Of course, I don't believe
> this. I'm just offering a word of caution, lest less scrupulous person
> use your argument against you on a public blog.
>
>
I think that's really been our topic of discussion, hasn't it? :-) I
am really doing some serious thinking about how to avoid that becoming
true. E.g., I need feedback on whether it is workable for ODF section
1.5 to be changed to instruct that the foreign element tags are only
for use in interoperability with non-conformant apps; that a
conformant app must not use those tags to stash its own dark objects.
I am seriously concerned that when Microsoft eventually decides to
actually support ODF, the way things stand it could be conformant even
if it stashed blobs in the foreign element tags. From a legal
standpoint, I want Microsoft to be met by a conformance section saying
that they have to express all of their metadata in valid ODF XML. But
I do not know if that would be too limiting for other developers, so
have requested feedback on that issue.

> I think you missed my argument. I posit that an OXML-based approach has
> the same benefits as the Foundation plugin. Since I certainly do not
> support an OXML-based approach, and I suspect few people on this list
> will, I find it hard to support the Foundation plugin. I think that
> putting OXML inside ODF is *bad*. I think that putting binary blobs
> inside ODF is *very**bad*. I hope this clarifies the misunderstanding.
>

I would not support allowing blobs in ODF were in not for the reality
that: [i] Microsoft has somewhere between 72 and 90 per cent of the
office suite market; [ii] Microsoft is maneuvering with EOOXML and
MOOXML to extend its existing office suite monopoly into the emerging
line of business/business processes software market; [iii] Microsoft
ain't cooperating with us; and [iv] a full fidelity method of
migrating Microsoft Office binary files to ODF is a market
requirement, allowing the monopoly to be broken. Full interop  would
be nice too, but is neither so urgent nor so easy to implement as is
full conversion fidelity in migrating those binary files to ODF and if
necessary round-tripping them.

That is why I want to make it clear in ODF section 1.5 that a
conformant app can't park its own dark objects in the foreign element
tags, unless that would mess up non-malicious developers. I want the
Microsoft blobs to be historical artefacts as soon as possible. In the
meantime, allowing them for migration purposes satisfies a big market
requirement for full fidelity migrations, my understanding is that
they will not affect fidelity or interop among ODF apps, and they will
hasten the day when Microsoft has to fully support ODF too in order to
maintain any market share.

>
> > A blob is a blob is a blob, whether wrapped with MOOXML tags or ODF
> > tags. But targeting the in-memory binary representations and their
> > dumps to file is in my mind the only practical approach to the
> > problem. I still haven't heard any reason to believe otherwise.
>
> Let me see if I understand your position: You claim that EOOXML cannot
> legitimately (ie, using real tags) represent future documents. That
> Microsoft will fill it with binary blobs going forward. Therefore,
> EOOXML cannot be relied on for conversion. Correct?
>
Again, I distinguish between MOOXML and EOOXML. As is hopefully better
explained above, I don't see EOOXML as a trustworthy intermediate
between IMBR and ODF. The IMBR is more likely to reflect the actual
range of metadata used in the Microsoft business line/business
processes stack.

>. I don't know the merit of your premise, so I will not dispute it. But
> those blobs then have to be stored in the EOOXML document, and if you
> just let through tags that you can't convert, those blobs end up in the
> ODF. You still get "100% fidelity". A blob is a blob is a blob and it'll
> still be impenetrable. But this would also be true if you got the blob
> directly from the in-memory binary representation. It's still a blob.
>
>
Yes, but will EOOXML capture all the blobs present in the IMBR? There
are reasons to suspect not, as discussed above.


> Using OXML as an extension of ODF is an example of a "solution" that has
> dark objects that are less dark than in the Foundation plugin. If you
> agree with me that using OXML as an extension of ODF is a bad idea, by
> my argument, the Foundation plugin is too. Both solutions have the same
> benefits: 100% fidelity, wrapping dark objects, etc. In the OXML one the
> dark objects are mostly documented XML, which is an improvement over
> just binary objects. If you don't support the OXML extension idea (I
> don't) I don't see how you can logically support a binary object idea.
>
I think your premise is doubtful. Until there is information
otherwise, I think the safe assumption is that EOOXML is roughly
equivalent to RTF in that it does not express the entire range of
metadata generated by the combination of MS apps that read and write
to MOOXML. On the other side, there's the indication that the IMBR may
contain metadata not preserved by EOOXML.

There is also the fact that we legitimize and promote the adoption of
EOOXML if we use it as an intermediary in ODF <> IMBR. I see no real
alternative to ODF apps importing MOOXML and EOOXML, but I think the
interests of ODF adoption are advanced by writing to the Microsoft
binary formats on the ODF > IMBR trip rather than using M/EOOXML as an
intermediary. I don't think we should play by Microsoft's game plan
any more than we must to serve users' needs.

> > What the unknown binary blobs accomplish,
> > > provided that applications don't drop them (as expected in ODF 1.2), is
> > > round-tripping: Say I have MS Office, and you have OOo. I send you a
> > > file. It looks like crap on your system, but you make an edit anyways
> > > and send it back. I would still be able to see all the items that you
> > > could not see. In this situation we do not have interoperability. What
> > > we have is MS Office round-tripping.
> >

Yup, full fidelity, not full interop.

> > Again, you've overlooked that the Foundation has proposed a solution
> > to the blob problem.
>
> Gary's "proposed solution" for interop is to flag binary objects and
> hope that applications will figure out how to deal with them. Have I
> missed a solution to the dark object problem?
>
I think so. The Foundation's solution doesn't just flag the dark
objects. It also describes them as completely as is possible at any
moment in time. If apps are implemented in such a way that they act on
what is known, then we're poised to, e.g., do online updates of what
is known about the dark objects without having to wait for another
round of the standardization cycle or another version of the
implementing app.

> You seem unable to articulate what this solution is, beyond "let's hope
> applications will figure it out". If that's the solution, then it is
> surely better to hope that applications will figure out what
> 'useSpacingLikeWord95' means instead of hoping that they'll figure out
> what '0TbUF0IZ+XQCXGk' means. Especially since, after they spend a few
> years trying to decode it, they'll just find out that 0TbUF0IZ+XQCXGk
> means 'useSpacingLikeWord95'.
>
The blobs that correspond to tags disclosed in the EOOXML are not the
problem. The Office 2003 XML References Schemas and the EOOXML spec
have been very helpful in identifying and specifying a great many dark
objects. The problematic dark objects are for the most part among
those that have no corresponding known XML tags. EOOXML really doesn't
help that much in understanding them.

> > They address the problem through the ODF foreign metadata
> > tags, a proposed MS Office ODF interop subset,
>
> Do you understand what a subset is? It looks like you don't. Please read
> this:
>
> http://en.wikipedia.org/wiki/Subset
>

Thanks, but my math does extend that far, albeit not a lot farther. :=)

> An interoperability subset for MS Office has nothing to do with dark
> objects. In brief:
>
> * Dark objects relate to MSO->ODF interop.
> * MSO subset relates to ODF->MSO interop.
>
I think you're mostly right here, except to the extent that the lack
of definition of the dark objects contributes to the selection of tags
for the ODF > MSO interop subset.

> >  and five proposed ODF
> > extensions that allow adequate description of the blobs" content as
> > more is learned about them.
>
> That's fantastic, and kudos for that. Those same extensions can be used
> to map portions of EOOXML-stored documents.
>
>
Yes, but I think that would require non-conformant EOOXML, i.e.,
EOOXML would probably have to be extended so apps could do something
intelligent with the information. I.e., writing to tags not presently
supported in EOOXML And I think there would be an additional problem
in that MS Office probably would not recognize whatever tags we might
add to EOOXML to reflect functionality for dark objects that have been
cracked. It's necessary to write to IMBR to overcome that problem.

> >  Meanwhile, their approach already has
> > achieved near-perfect fidelity.
>
> I thought it was perfect fidelity. Heck, you can get perfect fidelity by
> just dumping the entire in-memory structure inside a tag.
>
No, they say they are above 98 per cent fidelity in both directions
and the remainder is within grasp. But that's with complete mapping of
tags, not by dumping the entire IMBR structure inside a tag pair.

> > Word's primitive page layout engine
>
> I really hope you realize that you are talking about two different
> processes. You seem to be mixing MSO->ODF with ODF->MSO. Whatever
> limitations word's layout engine has are ODF->MSO. The discussion of
> extensions is MSO->ODF.
>
I don't know that there aren't issues going the other way as well.
Apparently if there are with Word, they are minimal. But I wouldn't be
surprised if problems running the other direction surface when they
move on to Excel and Powerpoint.

> > Phil Boutros of Stellent, who probably knows more about
> > file conversion issues than anyone else on the planet, pointed the way
> > with the ODF section 1.5 conformance section.
>
> You keep yapping about that. I keep asking you to read section 1.5. It's
> not rocket science, really, and it's less than a page. The relevant
> section is only two paragraphs. Here it is:
>
> <quote>
> Documents that conform to the OpenDocument specification MAY contain
> elements and attributes not specified within the OpenDocument schema.
> Such elements and attributes must not be part of a namespace that is
> defined within this specification and are called foreign elements and
> attributes.
>
> Conforming applications either MUST read documents that are valid
> against the OpenDocument schema if all foreign elements and attributes
> are removed before validation takes place, or MUST write documents that
> are valid against the OpenDocument schema if all foreign elements and
> attributes are removed before validation takes place.

I've been all over it and have cited it many times. I recall once
being corrected by Alex on an issue I had with relevant vocabulary.
I've explained the major changes to that section that look like they
are going to happen.

But I disagree that the two paragraphs are the only relevant parts of
section 1.5. There are more.
<http://develop.opendocumentfellowship.org/spec/?page=1#1.5>.

"Conforming applications that read and write documents *may* preserve
foreign elements and attributes."

The Metadata SC is expected to change "may" to "must" or "shall."

"In addition to this, conforming applications "should" preserve meta
information and the content of styles.

"Should" is expected to change to "must" or "shall."

This means:

    * The various <style:*-properties> elements (see section 15) may
have arbitrary attributes attached and may have arbitrary element
content. All attributes attached to these elements and elements
contained within these elements **should"  be preserved (see section
15.1.3);

"Should" is expected to change to "must" or "shall."

    "* elements contained within the <office:meta> element may have
arbitrary element content and should be preserved (see section
2.2.1)."

I don't recall what the proposal was with this section, but again,
"should" would probably have to change to "must" or "shall."

"Foreign elements may have an office:process-content attribute
attached that has the value true or false. If the attribute's value is
true, or if the attribute does not exist, the element's content should
be processed by conforming applications. Otherwise conforming
applications should not process the element's content, but may only
preserve its content. If the element's content should be processed,
the document itself shall be valid against the OpenDocument schema if
the unknown element is replaced with its content only."

This part is headed for major major rewrite to accommodate mandatory
preservation of metadata, the interop subset, and the five extensions
for describing dark objects.

"Conforming applications shall read documents containing processing
instructions and *should*  preserve them."

I believe the should here will change to "must"or "shall."

"There are no rules regarding the elements and attributes that
actually have to be supported by conforming applications, except that
applications should not use foreign elements and attributes for
features defined in the OpenDocument schema. See also appendix D."

This is the hunk I'm most interested in rewriting along the lines of:

"There are no rules regarding the elements and attributes that
actually have to be supported by conforming applications, except that
*conforming* applications *must* not use foreign elements and
attributes for features defined in the OpenDocument schema."

Something along those lines would avoid Microsoft being able to park
blobs in foreign element tags and still claim to be conformant with
the ODF standard, which affects MIcrosoft's ability to peddle its
software to government bodies. By making it clear that a conformant
app can't park its own blobs in foreign element tags, Microsoft would
have to implement "pure" ODF rather than ODF + blobs where ODF has the
needed tags. In fact, we might consider going farther and require that
a conformant app must also use XML rather than blobs for parking its
own metadata not specified by the standard.  And to make the intent
clear, bluntly state that the foreign element tags can only be used to
park blobs to the extent necessary to achieve fidelity or
interoperability with a non-conformant app.

I hope this post helps clear up some of the miscommunication and
confusion. I apologize for my part in it and for going off on you.

Best regards,

Marbux



More information about the odf-discuss mailing list