Home Forums General XMetaL Discussion Converting legacy Word documents to DITA (or RTF to XML?) – XMetaL 12 AE Reply To: Converting legacy Word documents to DITA (or RTF to XML?) – XMetaL 12 AE

Derek Read

Reply to: Converting legacy Word documents to DITA (or RTF to XML?) – XMetaL 12 AE

Yes, as you have noted, there is functionality included with the DITA authoring solution in XMetaL Author Enterprise that lets you copy from Word and paste into a DITA document. It attempts to convert HTML on the Windows clipboard into DITA, so it actually works whenever there is any HTML on the clipboard, which means you can copy from Word, a browser, or any application that puts HTML on the clipboard.

Results will be mixed depending on how the original source Word document was marked up, which also influences what Word puts on the clipboard. Different versions of Word encode documents in different ways and they also end up putting different things on the clipboard.

One notable example is where two separate lists have been joined together to form what appears to be styled as a single list in Word (ie: 1. 2. + 1. 2. looks like 1. 2. 3. 4. in Word). In this case Word may put two different lists on the clipboard (1. 2. followed by a different 1. 2.) and this results in two lists being created in a DITA document (there is no way to add missing information that Word does not provide). These are the kinds of things that all Word to DITA conversion solutions are going to run into to different degrees for different reasons depending on their approach.

I don't think there is any perfect solution. Mostly due to the wide variety of proprietary Word formatting that has to be dealt with and the amount of time people are willing to spend trying to deal with them all. So, at present, no matter which solution you choose I think there is going to be some manual fixing up to do. The one benefit to using the XMetaL copy and paste feature is that the resulting document should be valid. The main drawback is that there is no batch capability.