Home Forums XMetaL Tips and Tricks Script Example: Sort Lowercase CALS Tables (DITA, DocBook, etc) Reply To: Script Example: Sort Lowercase CALS Tables (DITA, DocBook, etc)


Reply to: Script Example: Sort Lowercase CALS Tables (DITA, DocBook, etc)

I just tested this on a CALS table in a DocBook document in XMetaL 4.6. It works great IF there are no entities in the table (e.g. &foo; where you've declared in the DOCTYPE or DTD and fails by having the table just disappear (i.e. it pastes over the old table with an empty string). So that's a pretty serious limitation and would affect any attempt to use an xslt to massage the markup like this, which is a potentially powerful tool. Obviously the parser can't resolve the entities without a doctype but if it resolves the entities then it's going to replace the table with the entities resolved. Another problem for the DITA folk is that this will never work with specialization without access to the DTD.

Option 1: Obfuscate the entities before processing by replacing & in tableStr with @@@ before running the xslt and then replacing @@@ with & when done.
Advantage: Entities remain unresolved in the result of the sort.
Disadvantage: If any entities were in the sort column then the sort is inaccurate.

Option 2: Get the DOCTYPE (is that possible?) and prepend it to tableStr.
Advantage: MSXML can resolve the entities and process tableStr
Disadvanages: Entities are resolved in the result, defeating the purpose of using them in the first place. Also, if you use catalog files to find the dtd and other resources, would MSXML know about them?

Option 2.a: Munge tableStr replacing &foo; with &foo;. Get the DOCTYPE (assuming that's possible) and prepend it to tableStr. Process with the xslt letting msxml resolve the entities (hopefully it can find the dtd). Once done, replace bar with &foo;. Note that this assumes you're only using entities for fairly simple inline cases and you don't do something like blah“>, which would probably trip up your xslt.
Advantage: Can sort based on entity values and return a result with the unresolved entities in place.
Disadvantage: Sounds hard. Can it be made to work with catalog files? Is it possible to get the text of the DOCTYPE?