Text formatting

Get help with installation and running here.

Moderators: DataMystic Support, Moderators

Post Reply
Posts: 12
Joined: Sun Oct 28, 2012 2:09 pm

Text formatting

Post by Aircut » Sun Oct 28, 2012 2:16 pm

I face the job of cleaning up malformed essays.

some of the writers leave no space after the full stop and other have an extra space before... same for commas, exclamation marks and question marks.

my question, is how to create a filter that removes unwanted space between the words and the full stop point, and adds one space after it, doing it to the entire block of text BUT skipping email addresses and URLs...

thank for any hints

User avatar
DataMystic Support
Site Admin
Posts: 2286
Joined: Mon Jun 30, 2003 12:32 pm
Location: Melbourne, Australia

Re: Text formatting

Post by DataMystic Support » Mon Oct 29, 2012 11:29 pm

The perl pattern you want to use is:

Code: Select all

 *?([,\!\?]) *?
Replace with

Code: Select all

For emails and URLs, you will need to use a different strategy for handling periods, perhaps replace periods in urls and hyperlinks with tabs temporarily (using a restriction), then use a perl pattern of:

Code: Select all

 *?([\.,\!\?]) *?
Replace with

Code: Select all


Simon Carter, http://DataMystic.com/forums/index.php
http://PredictBGL.com - Insulin dose calculator for Type 1 diabetes
http://DownloadPipe.com - 250,000 free software downloads
http://DetachPipe.com - send huge email attachments

Post Reply