Pages: 1
Print
Author Topic: Validating CALS tables  (Read 833 times)
Marvin
Member

Posts: 22


« on: May 03, 2018, 07:38:58 AM »

I would like to programmatically validate CALS tables in XMetaL Author 13.

I have found the following schematron:
https://github.com/nigelwhitaker/cals-table-schematron
https://github.com/nigelwhitaker/cals-table-schematron/blob/master/source/cals.sch

But when I try to run the validation using Tools->Validate using Schematron, I receive the following error message:

Quote
SEVERE: Exception com.google.gwt.core.client.JavaScriptException in invokeTransform: (TypeError) description: Object doesn't support property or method 'hasAttribute' number: -2146827850: Object doesn't support property or method 'hasAttribute'

Unexpected exception occured = Unable to get property 'xml' of undefined or null reference

Is this a problem with my XMetaL Author installation, or with that specific script?
Also, how can I call that validation programmatically?

Alternatively, is there a better way to validate CALS tables?

(Finally a quick bug report:
When you open Tools->Validate using Schematron and then choose "Select new Schematron file..." and click "Cancel" in the file open dialog, you can no longer open the file dialog, unless you either restart XMetaL or choose an existing Schematron file from the dropdown menu (i.e. if there is none you have to restart XMetaL Author). This is because the file dialog seems to be bound to an "selected item changed" event, but you cannot select the "Choose Schematron for validation" option, which is the only other option when there are now recent Schematron files in the list.)
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2575



WWW
« Reply #1 on: May 03, 2018, 01:23:11 PM »

I'll see if we can have someone look at that Schematron. Most likely it is simply incompatible with the Schematron engine we are using or the XSLT2 parser it runs on. Adjusting the file may be possible.

We should clarify what you mean by "validation" before answering your next question. Schematron is a language that allows you to make assertions about the presence or absence of specific patterns (whether nodes are present or not) in XML. This is different from the standard XML validation rules.

XMetaL Author is a "validating XML parser" and implements the standard validation rules according to the W3C's XML Recommendation. This feature is "live" at all times.

The best way to validate any XML document in XMetaL Author is to simply open the XML file. As long as it references a DTD or W3C Schema (aka: schema) the XML file will be validated immediately after the schema has been loaded. The same is true when you save -- validation is performed before the file is written out. You can also force validation to run at any time by selecting Validate from the Tools menu.

While editing an XML file another (proprietary) feature related to validation is engaged that we call "Rules Checking". This feature attempts to keep the document valid by not letting you put it into an invalid state when using the Element List, Attribute Inspector, via paste, etc. What this means is that if you start editing a valid document XMetaL will keep that document valid no matter what changes you make to it until you save.*

You can read more about the Validation and Rules Checking features in the help topic "Validation and rules checking".


* Note that this is not entirely true (though it is 99% of the time). XMetaL Author allows you to put a document into an invalid state in order to let you get to another valid state. One example is when an element must contain one (and only one) of multiple child elements and it already contains one of them. If you wish to insert a different child element XMetaL Author allows you to remove the existing child (putting the document into an invalid state) so you can insert one of the other allowed children. The majority of the cases will be similar, with XMetaL Author allowing you to delete things from a document putting it into a state where a required node might be missing. This is where validation comes in and that is why it is performed at file open and at file save.
Logged
Marvin
Member

Posts: 22


« Reply #2 on: May 03, 2018, 02:51:55 PM »

Thank you.

We'd like to "validate" more complex logic (beyond pure validation against a DTD), such as all rows having the same number of cells etc.
We already have a custom DTD set up (so rules checking etc. is very helpful) and validation works, but sometimes tables cause problems for our PDF generator because the numbers of cells do not match etc.
We'd like to be able to detect that in XMetaL Author, long before the problems reach the PDF generator, so end users can fix the issues themselves.
Once it reaches the PDF generator, it creates a support case for us.
Thank you.
Logged
Marvin
Member

Posts: 22


« Reply #3 on: May 04, 2018, 02:17:27 AM »

It would be great if someone could have a look at the Schematron.

In the meantime, is there a way for me to debug this? It sounds like the problem is in JavaScript code somewhere, can I somehow break on that exception and debug in Visual Studio?
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2575



WWW
« Reply #4 on: May 04, 2018, 02:33:03 PM »

You could try to debug the Schematron code, but I believe much of what we implemented for Schematron is obfuscated (not by us) so it is likely that would be difficult. Stepping through it should be possible to set up with the right breakpoints in place but making sense of what it is doing would probably be hard.

What is the specific reason for using a Schematron to check the validity of CALS tables?
Given that you can check the validity of any XML using XMetaL Author's XML validation engine (automatically) I assume there must be some specific reason for this redundancy?
« Last Edit: May 04, 2018, 02:38:49 PM by Derek Read » Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2575



WWW
« Reply #5 on: May 04, 2018, 02:57:08 PM »

I think I've found the original blog posting that lead you to that Schematron:
https://www.deltaxml.com/blog/dita/cals-table-validity/

From what I'm reading here, that Schematron published by DeltaXML (the one you are trying to use) may work with their own software (I'm not sure) but it sounds experimental given some of the comments and instructions included on the GitHub pages.

The blog post suggests the Schematron is essentially checking CALS table validity (not for some additional "business rules" or non-CALS-standards requirements) so it really sounds redundant to run a CALS table that has been validated in XMetaL Author through an additional validation engine. The XML validating parser in XMetaL Author will catch all validation issues with a CALS table (or any other XML) when comparing it to the proper DTD (or XSD).

If you can let me know what the ultimate end goal is that will help us understand your needs better.
Logged
Marvin
Member

Posts: 22


« Reply #6 on: May 07, 2018, 05:12:09 AM »

Here's an example of what we're trying to catch:

Code:
<table id="table_04B99EE8BBA24339894A7ACD555448B1">
<tgroup cols="2"><colspec colnum="1" colname="col1"
  colwidth="*"/><colspec colnum="2" colname="col2" colwidth="*"/>
  <tbody>
<row>
<entry colname="col1" morerows="1">
</entry>
<entry colname="col2">
</entry>
</row>
<row>
<entry colname="col1">
</entry>
<entry colname="col2">
</entry>
</row>
  </tbody>
</tgroup>
</table>

Because of the "morerows" attribute, there is a collision between two cells.
The Schematron detects this in Oxygen, and we'd like to have the same functionality in XMetaL Author Enterprise 13.

(By the way, there is bug in the forum: When I log in to reply, I can't reply straight away. The forum software tells me that I have to wait for 180 seconds between posting, so it obviously treats the login as posting a reply. When I hit reply again after waiting for a few minutes, it also claims I have already posted that ("You already submitted this post!"), so I have to copy my reply, reload the thread and then paste my reply again.)
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2575



WWW
« Reply #7 on: May 07, 2018, 06:16:47 PM »

I'm not sure that is a good example as the CALS table you have listed here is following the CALS specification. If the DTD you are using to validate your documents follows the CALS spec then XMetaL Author will not catch any validation errors (because this table is valid) and it will also render it with the first cell in both rows merged (as the CALS spec states). See attached image.

So, I guess this does mean this Schematron is performing checks that are specific to some additional rules that are otherwise OK according to the CALS spec?

Are you saying that oXygen flags usage of the @morerows attribute as an issue and that you want to create a Schematron for use with XMetaL Author that tells people not to use the @morerows attribute?


* CALS_with_merged_row.jpg (88.54 KB, 910x683 - viewed 63 times.)
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2575



WWW
« Reply #8 on: May 07, 2018, 07:16:02 PM »

The Java error displayed when attempting to use this Schematron (which appears to have been specifically written to solve some issue that DeltaXML identified with oXygen) is probably because Saxon-CE 1.1* (which is the engine XMetaL Author 13 uses) does not support namespaces. It is also possibly because this implementation does not support includes.

If you could define the things you wish to check for I could try to provide a sample Schematron you could extend for specific use with XMetaL Author.


*Saxon-CE 1.1 docs: http://www.saxonica.com/ce/user-doc/1.1/index.html
Logged
Marvin
Member

Posts: 22


« Reply #9 on: May 08, 2018, 03:40:22 AM »

We have some issues with the DITA OT PDF generator when cells overlap or when attributes like "morerows" would span outside the bounds of the table (e.g. the table only has 2 rows, but morerows=7, etc.).

This is correctly detected by said Schematron in Oxygen:


* 7DDEBACC.PNG (76.52 KB, 1724x1193 - viewed 68 times.)
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2575



WWW
« Reply #10 on: May 08, 2018, 04:22:08 PM »

I'm a bit confused by that oXygen error message. The screenshot shows that in the tgroup @cols="2" so there are two columns. There are also 2 <colspec> elements (so that's consitent). There are two rows and each row has two entry elements (so number of entry elements matches the colspec). I do not see anything that would indicate there are more than 2 cells. Oddly, the error also says something about "row (3)" but there are only two rows shown in this table.

I could imagine a scenario where some authoring tool allowed you to create a table that has too many entry elements in a row, and perhaps that would not be caught by standard XML validation (with DTD). Is that the issue you are trying to show here? Under normal circumstances that would be extremely difficult to do while authoring in XMetaL Author. To do that I think it would be necessary to switch to Plain Text view and then either manually type in the angle brackets, the element name <entry> and other things. Is that something your users need to do? We typically don't see people working with tables that way. Editing in Tags On or Normal view is preferred because the table is rendered as a table and all the various editing features for adding/removing/moving rows and columns make editing easy.

Putting that aside, which transtype are you using that has issues with @morerows?

It sounds to me that if there is an issue with an output transtype when valid XML / DITA / CALS) is passed to it (in the case of the @morerows values) that the issue is with the transformation process itself (ie: there's a bug with the DITA OT transtype). Rather than try to detect and try to avoid issues that trigger a bug in a transtype doesn't it make more sense to log them as bugs with the DITA OT project so they can be corrected?
Logged
Marvin
Member

Posts: 22


« Reply #11 on: May 09, 2018, 07:53:38 AM »

I'm still quite new to this project, so I'm not exactly sure how this happens and what the consequences are in DITA OT.
But:

Quote
morerows: number of additional rows in a vertical span. There shall be at least that many more rows in the appropriate thead or tbody. Any entries with morerows that would attempt to extend further downward is an error.
Source: http://www.datypic.com/sc/cals/a-nons_morerows.html

That is one of the cases we're trying to catch.
The other thing is namest/nameend:

Quote
It is an error if the namest value is not defined in a colspec for the current tgroup.
http://www.datypic.com/sc/cals/a-nons_namest.html
(the same applies for nameend)
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2575



WWW
« Reply #12 on: May 09, 2018, 03:28:33 PM »

If an author uses the XMetaL Author Enterprise UI provided for working with tables then these issues should not generally be a problem. Perhaps that is what we should recommend here.

In other words, if the user has a document open in Tags On or Normal view, where CALS tables are rendered visually as a table (or HTML or some custom table type), they can use various table editing features to split and merge cells and avoid all the complications of possibly inserting extra elements or incorrect attribute values.

On the Table menu we provide a "merge cells" dialog that lets you merge left/right/up/down and that automatically handles the setting of the various attributes for CALS tables (morerows, namest, nameend). The Table toolbar provides similar functions: "Merge Cell Right", "Merge Cell Left", "Merge Cell Up", "Merge Cell Down", "Split Cell into Rows" and "Split Cell into Columns" (together with many others that allow you to add rows/columns, and move rows and columns up/down/left/right).

Of course, it is always possible to create messy table structures that are invalid or that violate some rules if you are working with raw XML or directly with tags and attributes. That is why XMetaL Author provides a large number of UI features for table editing (see attached images). Perhaps if you have issues with tables being created that have problems it would be best to deal with those issues directly in the tool that is helping create those issues? Perhaps there are similar features or a method of dealing with them.


* table_context_menu.JPG (29.53 KB, 246x510 - viewed 66 times.)

* table_properties_dialog.JPG (103.52 KB, 692x676 - viewed 68 times.)

* table_menu.JPG (26.06 KB, 203x498 - viewed 67 times.)

* table_toolbar.JPG (8.78 KB, 481x53 - viewed 61 times.)
Logged
Marvin
Member

Posts: 22


« Reply #13 on: May 10, 2018, 03:01:19 PM »

I completely agree.
I'm not sure how these invalid documents were created - normally it shouldn't be possible to create them in the first place.

But we probably have some invalid documents already at this point, so it would be great to be able to detect this.
Would be difficult to implement such validation in XMetaL, in addition to trying to avoid new document from becoming corrupted?
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2575



WWW
« Reply #14 on: May 22, 2018, 02:51:36 PM »

I suspect it should be possible given the right requirements. I can't guess how much effort would be required to get it working. I think it would be difficult to justify possibly doing so much work for (what I suspect) is of minimal benefit. Most likely it would be easiest to rewrite it from scratch based on the requirements for what to check (what is considered "invalid").

If the Schematron that DeltaXML created was straightforward, consisting of just asserts and rules then that would likely just work, or possibly need minimal changes. However, their Schematron uses <include> to bring in a couple of complicated XSL files and if those are required I don't think that is going to be easy. There are also a number of namespaces declared and for the most part Saxon-CE implementation of XSLT2 does not handle namespaces (though JustSystems has implemented support for a number of the ones used by Schematron with some workarounds so that Saxon-CE never sees those).
Logged
Marvin
Member

Posts: 22


« Reply #15 on: May 23, 2018, 05:13:51 AM »

I guess most likely it would make more sense to implement this as a validation macro then?
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2575



WWW
« Reply #16 on: May 25, 2018, 05:28:50 PM »

That might be easier.
Logged
Pages: 1
Print
Jump to:  

email us