Convert Word documents to text filter

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

Post Reply
DFH
Posts: 867
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Convert Word documents to text filter

Post by DFH » Tue Feb 26, 2019 3:04 am

Is there any good reason why the filter to Convert Word documents to text leaves the BOM in the UTF-8 text output file?

The help page for this filter does not state that the BOM may need to be removed afterwards!

Best regards,

David

User avatar
DataMystic Support
Site Admin
Posts: 2325
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Convert Word documents to text filter

Post by DataMystic Support » Mon May 06, 2019 1:51 pm

At the start of the file? Or at other locations?
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 867
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Convert Word documents to text filter

Post by DFH » Tue May 07, 2019 6:18 pm

At the usual place for a BOM - the start of the file.

Aside: I don't regard U+FEFF ZERO WIDTH NO-BREAK SPACE [BOM, ZWNBSP] : BOM, ZWNBSP at any other location as a BOM.
In other locations, it functions as a ZWNBSP, which was not part of my report.

Best regards,

David

User avatar
DataMystic Support
Site Admin
Posts: 2325
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia
Contact:

Re: Convert Word documents to text filter

Post by DataMystic Support » Tue May 07, 2019 10:27 pm

I will adjust the help file to include this.
Regards,

Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

DFH
Posts: 867
Joined: Sun Dec 09, 2007 2:49 am
Location: UK

Re: Convert Word documents to text filter

Post by DFH » Fri Jun 07, 2019 1:29 am

In the early days of Unicode, the naming convention was
- UTF-8 without BOM
- UTF-8

The latter had the BOM implicitly.

More recently, the naming convention has changed to
- UTF-8
- UTF-8 with BOM

The former now has no BOM implicitly.

The convention change was recognized and implemented by the Notepad++ text editor developers several years back.
i.e. In the respective options of its Encoding menu.

It behoves TextPipe to also recognize the change and for the UI and Help files to be consistent to the current convention.

David

Post Reply