Process inside compressed files

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Process inside compressed files

Postby DFH » Tue Mar 29, 2011 8:40 pm

Process inside compressed files currently has only these file types: ZIP, DOCX, XLSX and PPTX.

How about adding support for OpenDocument ? e.g. ODT files, etc.

See http://en.wikipedia.org/wiki/OpenDocument

It would then be feasible to process the file content.xml inside an OpenDocument word processing file.

David
Last edited by DFH on Wed Mar 30, 2011 9:22 pm, edited 1 time in total.

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Process inside compressed files

Postby DataMystic Support » Wed Mar 30, 2011 2:52 pm

Thanks David,

But
OpenDocument files can also take the format of a ZIP compressed archive containing a number of files and directories


So does this mean that the forms
# .odt for word processing (text) documents
# .ods for spreadsheets
# .odp for presentations
could be just XML, or could be a .zip file, optionally? Or are they always zip format these days?
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Process inside compressed files

Postby DFH » Wed Mar 30, 2011 9:12 pm

Simon,

If from Word 2007, I save a file as OpenDocument format (file extension .odt),
I can readily examine the contents of the saved file using an archive manager such as 7-Zip.

The compressed file contains a content.xml file along with other files, etc.
See attached image that illustrates this.

David
Attachments
InsideODT.png
Inside an OpenDocument file.
InsideODT.png (70.15 KiB) Viewed 4569 times

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Process inside compressed files

Postby DFH » Wed Mar 30, 2011 9:20 pm

Further note:

content.xml is "linearized", in that all the XML (after the schema) is on a single line of text.

However, it can be made more legible using the "Pretty-print" feature of XML Copy Editor.

See attached image. See also http://xml-copy-editor.sourceforge.net/

David
Attachments
PrettyContentXML.png
Pretty view of extracted content.xml
PrettyContentXML.png (105.8 KiB) Viewed 4569 times

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Process inside compressed files

Postby DFH » Wed Mar 30, 2011 9:37 pm

The Notepad++ plugin called XML Tools is another means to "Pretty-print" an XML file, and it also has a "Linearize" option.

Just a further suggestion for your developers....

Maybe it would be nice if TextPad could be enhanced to also include such methods by means of various XML sub-filters.

David

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Process inside compressed files

Postby DataMystic Support » Fri Apr 01, 2011 2:11 pm

Thanks David - that is very detailed and very helpful.

We have added .ODT for the next release of TP.

Also - I have attached a sample XML Linearize filter.
Attachments
xml linearize.zip
Linearize XML files
(804 Bytes) Downloaded 350 times
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Process inside compressed files

Postby DFH » Fri Apr 08, 2011 12:56 am

Thanks Simon.

I think you may have the "XML linearize" terminology flipped!
A linearized XML file is one with everything (except the schema) as a single (very long) line.
A "Pretty Print" XML file is one where the XML is "de-linearized" and intelligently indented.

Examining the rudimentary XML Linearize filter, I observe that (as yet) it does not also apply any indenting.
Something to think about for the future, perhaps. Not urgent - I can still use XML Copy Editor.

Also, it would be sensible to tick Enable UTF-8 support in the Perl sub-filters.

David

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Process inside compressed files

Postby DataMystic Support » Fri Apr 08, 2011 9:36 am

Thanks David - I will create a new 'XML pretty print' filter and create a new XML Linearize filter to simply replace all cr/lfs with space, and optionally to compress spaces.

I will also enable utf-8 support in those filters.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 636
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Process inside compressed files

Postby DFH » Fri Apr 08, 2011 10:54 pm

Simon,

When linearizing XML, please take care over the the XML schema.
Normally this should be on the first line of text.

Sometimes the definition lookup is spread over more than the first line of text.
Some XML validation tools fail when this is the case.

David

User avatar
DataMystic Support
Site Admin
Posts: 2138
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Process inside compressed files

Postby DataMystic Support » Sat Apr 09, 2011 10:25 am

I have attached updated filters - BTW, I know the pretty printer is far from pretty.
Would you like to check the schema linearizing? - if it is just between <> then it should be put on one line anyway.
Attachments
xml linearize and pretty print2.zip
(1.54 KiB) Downloaded 338 times
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments


Return to “TextPipe Tips and Tricks, Questions and Support”

Who is online

Users browsing this forum: No registered users and 3 guests