Pages: 1
Print
Author Topic: Copy & Paste from MS Word  (Read 1985 times)
rnv
Member

Posts: 48


« on: February 26, 2015, 07:27:36 AM »

Hi,

when I copy and paste text from MS Word in DITA Generic Topic it is fine but the list items are not formatted properly. Each individual list items have their own 'ol' tag. e.g.

it should have multiple 'li' in one single 'ol'.  is there a way to control it?

how it is pasted currently in xmetal 9.0

Code:
<p>
<ol id="ol_43F5FD2CFBD7446CA38183BBA87C56D9">
<li id="li_9477A51A4C8C4AEE9E967471C6930E84"> · Dfdl</li>
</ol>
</p>
<p>
<ol id="ol_4594F8BEF9E24828B28F650108D0AF2B">
<li id="li_1F0F7E759FEB4124AC88417D543A8B7A"> · Dflld</li>
</ol>
</p>
<p>
<ol id="ol_C4C13E0884D14D1CBF01D5B2D878849E">
<li id="li_88CBE6F3F7DA42639CE11689017F7FD3"> · Dlld</li>
</ol>
</p>
<p>
<ol id="ol_AAF042B7C5184BB88FC28131B5944450">
<li id="li_A784499A796B4636A28BD9523DFF13FC"> ·</li>
</ol>
</p>
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2579



WWW
« Reply #1 on: February 26, 2015, 03:34:05 PM »

What ends up being transformed into DITA is dependent on what Word is putting on the clipboard and in this specific case (which I am aware of) the issue is known but not something we have much control over. In some cases Word puts multiple lists onto the clipboard even though they look like one list in Word. You can see this if you have a clipboard viewing tool (XP used to have clpbrd.exe) that will allow you to examine what Word puts on the clipboard when you do a copy.

When you paste multiple lists into a DITA document in XMetaL Author Enterprise the software does not try to guess that it should combine them. You can compare the Word behaviour to others by finding a list in an HTML page and copying and pasting that. In that case you should always get a single list in DITA, since a single list is what is put onto the clipboard (at least with every browser I've seen).

What Word puts on the clipboard might vary for different versions of Word but I suspect it is mostly due to the way the list is initially created or the kinds of Word features that have been used to edit the list and Word's internal representation of the list (which seems to treat multiple lists as one if they are right next to each other, or at least it sometimes tracks a single list using different pieces). We don't have any insight into Word's internal markup, the software just examines what has been placed on the clipboard and tries to deal with that.

I'm attaching a copy of a Word document created with Word 2010. It puts a "good" (single) list on the clipboard, at least in my testing. Copying the list using Word 2010 pasting it into XMetaL Author Enterprise (tested with version 9) results in conversion to a single ordered list in a DITA document.

* listtest.zip (10.65 KB - downloaded 153 times.)
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2579



WWW
« Reply #2 on: February 26, 2015, 03:51:33 PM »

Here's a workaround that works for me:

1. Use Word to open your Word document.
2. Use Word's "Save As" command and choose "Web Page, Filtered" as the file type to save the file. If you are told that some features in the document are not supported you will need to answer "yes". This affects the newly created HTML file so is safe to do (but of course always have a backup of your original just in case).
3. Close the file.
4. Now you can make a choice:
a. Open the "fixed" HTML file back into Word and then re-save it as a Word document. The lists should be "normal" now but the file might not look exactly the same if some of the original Word features you used don't have a direct HTML equivalent.
b. Open the HTML file in a browser and copy and paste from there.

Saving to HTML seems to force Word to convert whatever confused markup it is using internally for lists to a single list, so they must have implemented some special code to deal with that (otherwise web pages saved from Word would include broken up lists like you are seeing after pasting into XMetaL Author Enterprise).
Logged
Pages: 1
Print
Jump to:  

email us