Pages: 1
Print
Author Topic: Script Example: Sort Lowercase CALS Tables (DITA, DocBook, etc)  (Read 2919 times)
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 1548



WWW
« on: September 21, 2009, 06:37:57 PM »

Products:
Tested with XMetaL Author Enterprise 5.1.1.017 and 5.5.0.219 on Windows XP with CALS tables inside DITA topics.
This should probably work with DocBook and similar Schemas that also use standard CALS tables, but I haven't done any testing for these.

Purpose:
Sort a CALS <table> that has lowercase element names based on the content of cells in a given column (ie: DITA or DocBook <table>).
Assumes the author running the script wants to sort the current table based on the cell content of the column their insertion point (cursor) is within.  The sort is hard-coded to perform an alphabetic sort from low to high values. Therefore, with this version the user is not prompted for any additional requirements, however, they could be (see MCR file for comments).

In addition to sorting the table this script demonstrates how to build an XSLT on the fly and pass it, together with a chunk of XML (in this case a <tgroup>), to MSXML, which then returns a string (or perhaps throws an error). The existing table's <tgroup> is deleted and replaced by the newly sorted <tgroup>.

Note: If the sort did not accomplish anything (ie: the returned <tgroup> is the same because it didn't need sorting) the <tgroup> is still replaced.

Demo Code:
Before using this script please read the notes and comments in the MCR file (which also includes some legal stuff). Basically, this code is provided as a demo and should be treated as if it were completely untested. I have tested it as best I can, but it has not gone through our regular rigorous test process. You may also wish to adapt this code by altering the XSLT and / or the script logic itself as you may find that the functionality does not meet the exact needs of your end users, etc.

Please also do not use this script without the permission of the people that maintain your XMetaL installation (if that isn't you). Although the possibility is low given the way I have coded this it could conflict with special customizations or scripts, 3rd party tools or plug-ins, a specific work-flow they have set up and wish you to follow, or any number of other things I cannot even guess at. I would recommend telling them about your wish to have something like this and let them integrate it and test it for you.

Installation:
1. Unzip attached ZIP file.
2. Place MCR file in the Startup folder of your XMetaL Author Enterprise installation.
3. Restart XMetaL Author Enterprise if already running.

Note: You must be logged in to download attachments on this forum.

Requirements:
The script relies on the availability of Microsoft's MSXML COM control (ActiveX) version 4 (most systems will probably have versions 3 thru 5+). This should be the case on the vast majority of Windows XP and Vista machines. However, if you do run into a situation where the script is throwing an error regarding MSXML that is the first place to look and the first thing to check for.

Usage:
Warning: Do not use this script with real content until you are satisfied it passes your testing.
1. Open or create a new DITA test document that contains a <table>.
2. Place cursor inside the <table> in the column you wish to sort by.
3. Run the script called "Demo: Lowercase CALS Table Sort".
4. If your document is messed up you should probably be able to undo the entire script with one Undo operation.

Modifications, Extending Code, and Known Limitations:
See comments in the MCR file.

* derekreadExtension-demoTableSort.zip (3.6 KB - downloaded 151 times.)
« Last Edit: September 21, 2009, 06:57:21 PM by Derek Read » Logged
dcramer
Member

Posts: 120


« Reply #1 on: September 22, 2009, 08:42:14 AM »

I just tested this on a CALS table in a DocBook document in XMetaL 4.6. It works great IF there are no entities in the table (e.g. &foo; where you've declared <!ENTITY foo "bar"> in the DOCTYPE or DTD and fails by having the table just disappear (i.e. it pastes over the old table with an empty string). So that's a pretty serious limitation and would affect any attempt to use an xslt to massage the markup like this, which is a potentially powerful tool. Obviously the parser can't resolve the entities without a doctype but if it resolves the entities then it's going to replace the table with the entities resolved. Another problem for the DITA folk is that this will never work with specialization without access to the DTD.

Option 1: Obfuscate the entities before processing by replacing & in tableStr with @@@ before running the xslt and then replacing @@@ with & when done.
Advantage: Entities remain unresolved in the result of the sort.
Disadvantage: If any entities were in the sort column then the sort is inaccurate.

Option 2: Get the DOCTYPE (is that possible?) and prepend it to tableStr.
Advantage: MSXML can resolve the entities and process tableStr
Disadvanages: Entities are resolved in the result, defeating the purpose of using them in the first place. Also, if you use catalog files to find the dtd and other resources, would MSXML know about them?

Option 2.a: Munge tableStr replacing &foo; with <temp name="foo">&foo;</temp>. Get the DOCTYPE (assuming that's possible) and prepend it to tableStr. Process with the xslt letting msxml resolve the entities (hopefully it can find the dtd). Once done, replace <temp name="foo">bar</temp> with &foo;. Note that this assumes you're only using entities for fairly simple inline cases and you don't do something like <!ENTITY myentry "<entry>blah</entry>">, which would probably trip up your xslt.
Advantage: Can sort based on entity values and return a result with the unresolved entities in place.
Disadvantage: Sounds hard. Can it be made to work with catalog files? Is it possible to get the text of the DOCTYPE?

Logged

David Cramer
Technical Writer
Motive, an Alcatel-Lucent Company
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 1548



WWW
« Reply #2 on: September 22, 2009, 02:44:21 PM »

Interesting find. I didn't even think about entities and because MSXML is involved here that adds to the complexity. Your logic on the workarounds seem doable in script, but I'm not sure I will try yet.

At this point I think the script is useful for the majority of people (the 80/20 rule probably applies, or maybe even 90/10 as the majority of our DITA clients are not specializing yet, though there are quite a few). A large portion of the population also doesn't use entities so I guess they'd be fine too.

However, I suppose I should at least try to stop the table from being deleted. I assume Undo works to restore the table?

There are other completely different approaches I've tried, one of which was a pure JScript string manipulation. It was about 1/2 done maybe when I had what I thought at the time was a flash of genius to use XSLT instead. I might go back to that other code and see if it might work better or be easier to implement (and ultimately it would be nice to have this type of functionality right inside the product in the form of some new APIs, though I don't see that happening anytime soon).

One nice thing about the XSLT version is that it would likely be fairly easy to extend it to sort on more than one column at a time (because the MSXML support for XSLT has such a thing built-in).
« Last Edit: September 22, 2009, 02:46:51 PM by Derek Read » Logged
dcramer
Member

Posts: 120


« Reply #3 on: September 22, 2009, 03:33:00 PM »

I like the xslt approach very much because it would be easy to create variants on this macro to transform other structures in ways that would be difficult or impossible using JScript and DOM, but the entity thing has to be addressed. I think Option 1 would be easy and would take care of most of the situations I would need it for. I'll take a stab at that and let you know how it goes. For Option 2.a. I would need help since I wouldn't know how to capture the DOCTYPE (not just the public and system identifiers but also the locally declared entities) as a string.

Even better would be to use Saxon 9 to do the xslt because then you could use xslt 2.0 and much more interesting things. But I have no idea how you'd go about doing that.

Thanks,
David
Logged

David Cramer
Technical Writer
Motive, an Alcatel-Lucent Company
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 1548



WWW
« Reply #4 on: September 22, 2009, 03:38:23 PM »

Here's a quick fix to stop the <tgroup> from being removed when the table contains entities.

It doesn't actually fix the issue (I'll let dcramer see if he can work that out), but at least this will tell you the table couldn't be sorted and stops it from disappearing in the case of entities and possibly other cases:

Old
Code:
rngWork.TypeText(sortedTable);

New
Code:
if (sortedTable > "") {
rngWork.TypeText(sortedTable);
}
else {
Application.Alert("This script is not smart enough to sort your table.");
}
Logged
dcramer
Member

Posts: 120


« Reply #5 on: September 22, 2009, 08:50:22 PM »

Here's a fix to hide entity references:

Code:
244a245,259
>
> // HACK ALERT!!
> // Here we replace any ampersands in the source
> // XML with a placeholder value, "@@AMPERSAND@@"
> // to keep MSXML from trying to resolve the entity.
> // We'll change it back later. However we first
> // make sure that the string @@AMPERSAND@@ does
> // not already exist in the table.
> if(tableStr.match(/@@AMPERSAND@@/)){
> Application.Alert("This table already contains the string @@AMPERSAND@@.\nThis macro reserves the string\n@@AMPERSAND@@ to hide entity references.");
> return;
> }else{
> tableStr = tableStr.replace(/&/g,'@@AMPERSAND@@');
> }
>
269c284
< rngWork.TypeText(sortedTable);
---
> rngWork.TypeText(sortedTable.replace(/@@AMPERSAND@@/g,'&'));
« Last Edit: September 23, 2009, 06:50:07 AM by dcramer » Logged

David Cramer
Technical Writer
Motive, an Alcatel-Lucent Company
dcramer
Member

Posts: 120


« Reply #6 on: April 05, 2010, 01:37:21 PM »

Another suggested change:

Code:
< xsltStr += '<xsl:template match = "*" >\n';
---
> xsltStr += '<xsl:template match = "*|processing-instruction()|comment()" >\n';

Otherwise you lose comments and PIs in the table.
Logged

David Cramer
Technical Writer
Motive, an Alcatel-Lucent Company
Pages: 1
Print
Jump to:  

email us