[odf-discuss] Mars: XMLisation of PDF - opportunity for ODF?

Lars D. Noodén lars at umich.edu
Fri Nov 10 03:18:37 EST 2006


On Thu, 9 Nov 2006, J David Eisenberg wrote:
> Some PDFs are bitmapped images; others contain text, which is in some
> compressed form.  The ps2ascii tool on Linux will extract such text from a
> PDF quite nicely.

Yes. ps2ascii has been helpful in extracting text (when it exists) from 
PDFs for using in search engines/ web indexes.

I think I'm losing the original thread or else am too slow to think of how 
XML would be useful in PDF.

The only part that I could see as being clearly beneficial would be for 
the document's metadata to be encoded as XML, but that may already be the 
case.  I am unfamiliar with the inner workings.  The metadata will be 
text, though the PDF may be wrapping just about any kind of non-text.

-Lars

Lars Noodén
 	Ensure access to your data in the future
 	http://opendocumentfellowship.org/about_us/contribute


More information about the odf-discuss mailing list