[odf-discuss] Mars: XMLisation of PDF - opportunity for ODF?
Lars D. Noodén
lars at umich.edu
Fri Nov 10 03:18:37 EST 2006
On Thu, 9 Nov 2006, J David Eisenberg wrote:
> Some PDFs are bitmapped images; others contain text, which is in some
> compressed form. The ps2ascii tool on Linux will extract such text from a
> PDF quite nicely.
Yes. ps2ascii has been helpful in extracting text (when it exists) from
PDFs for using in search engines/ web indexes.
I think I'm losing the original thread or else am too slow to think of how
XML would be useful in PDF.
The only part that I could see as being clearly beneficial would be for
the document's metadata to be encoded as XML, but that may already be the
case. I am unfamiliar with the inner workings. The metadata will be
text, though the PDF may be wrapping just about any kind of non-text.
-Lars
Lars Noodén
Ensure access to your data in the future
http://opendocumentfellowship.org/about_us/contribute
More information about the odf-discuss
mailing list