Home Forums DITA and XMetaL Discussion search and replace for special characters Reply To: search and replace for special characters

Derek Read

Reply to: search and replace for special characters

Here are a few solutions that might work. I'm listing them in order of what I think are easiest to hardest to get working…

1) Turn on Pretty Printing for DITA documents. If it is off then your double carriage returns will be preserved (which is what seems to be occurring now). When it is turned on (which is the default) then the pretty printing feature should correct the issue automatically when you save. To turn this on for DITA run the macro called “DITA Configuration: Turn ON Pretty-Printing” in the Macros toolbar list.

2) If you must have pretty printing turned off, but also need to remove these duplicate carriage returns from within

elements you can add the attached MCR file to your AuthorStartup folder then restart the software. The file is attached as demo_doubleCarriageReturnRemover_v1.zip so you will need to unzip it first.

Be sure to test this on non-production files (ie: make sure you have backup copies) until you trust it.

To uninstall this macro simple remove the MCR file.

Legal (for the attached MCR file):
* Licensed Materials - Property of JustSystems, Canada, Inc.
* (c) Copyright JustSystems Canada, Inc. 2010
* All rights reserved.
* The sample contained herein is provided to you "AS IS".
* It is furnished by JustSystems Corporation as a simple example and has not been
* thoroughly tested under all conditions. JustSystems Canada, Inc., therefore, cannot
* guarantee its reliability, serviceability or functionality.
* This sample may include the names of individuals, companies, brands and products
* in order to illustrate concepts as completely as possible. All of these names are
* fictitious and any similarity to the names and addresses used by actual persons or
* business enterprises is entirely coincidental.

This MCR file will add a new macro called “DITA Workaround: Double Carriage Return Remover” to the list of macros. Before you save a document run this script. It will remove duplicate carriage returns from all

elements in the document. This could be extended to cover other elements, but given your initial sample I'm assuming at this point that the issue only affects


Note that script actually replaces any and all sequences of two or more white-space characters (carriage returns, tabs, or regular spaces) with a single regular space (ie: your standard space bar space, U+0020).

It also affects the content of all children of

elements. Given your example, and even with most other cases where

does contain child elements, such as for example, this should not be an issue.

However, if one of the children is and that element contains multiple carriage returns in a row that you wish to keep they will also be replaced. Handling all the other possible cases would probably take quite a bit of investigation and more coding.

As this is a quick and dirty fix I have not attempted to take all possibilities into account. That is what we would need to try to do if we were to try to handle this at the root cause, which would likely be to try to clean up the Word content before it makes it into the document and at that point we're looking at a complete development cycle including proper testing, etc.

3) If you can identify the exact markup or styling in Word that triggers this issue then perhaps modifying the Word document before copying and pasting is another option. Word has it's own API (similar to XMetaL Author) so that could be used to automate this process if there are lots of legacy documents.

4) Try saving the document from Word as HTML then opening it in a browser and copying and pasting from there. Just a hunch, but the process of Word exporting to HTML might just cause it to fix things up for you. What you get will probably vary widely depending on your Word version so keep that in mind if there are multiple people doing this.

5) Add an additional processing step after saving (probably XSLT) that removes these carriage returns. Perhaps your translation company offers translation memory software? If they do then it might have features that will allow for “normalizing” this type of thing.