Pages: 1
Print
Author Topic: Workaround: Automatic replacement of unescaped < > & characters in attributes  (Read 3437 times)
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2450



WWW
« on: June 03, 2011, 06:44:06 PM »

Products:
XMetaL Author Enterprise and XMetaL Author Essential 5.5 and 6.0
This script as written may not fire in older versions and would need to be modified to use a different event(s).

Issue Description:
For various reasons an attribute value may end up containing an unescaped character that should be escaped (encoded as an entity).
Examples:
   1. User opens a document created in a 3rd party application.
   2. User types the character into an attribute value in Plain Text view.
   3. User types the character into the Attribute Inspector.
   4. Copy and paste.
   5. Script.

Depending on context, the 5 predefined entities in XML may need escaping inside an attribute value. XMetaL Author automatically handles the " and ' characters (quotation mark and apostrophe) but it does not automatically escape < > or & (less-than, greater-than, and ampersand) when they need to be.

XMetaL should correctly issue a validation error when it encounters attribute values containing these characters but this may be confusing for less-experienced authors as current functionality requires authors to manually correct such errors. If a document contains many such errors manually correcting them would also be painful.

Purpose of the Script:
Find characters that should be escaped in attribute values and convert them to entities while leaving any existing entities as they are.
The characters dealt with specifically by this script include: < > and & (only). This script is designed to escape these characters just before a document is validated so that the if the author validates the document or saves they never see validation errors related to these characters.

Basic Script Logic:
During document validation (which normally also fires just before a save action):
   For every element in the document...
      For every attribute in the element...
         Place the attribute value in a temp string to work on.
         Working on the temp string:
            1. Extract existing entities (ie: &amp; &#1234; etc) from the string into an array.
               Example: {0}=&amp; {1}=&#1234; {2}=etc...
            2. Replace the existing proper entity values with a placeholder unlikely to ever appear in an attribute value.
               I'm using "_____entity_____placeholder_____".
               In the unlikely case where the attributes in your environment might contain that specific string you may
               alter it to something else.
            3. Replace all remaining & chars with &amp;
            4. Replace all < chars with &lt;
            5. Replace all > chars with &gt;
            6. Restore the previously existing entities by looping through the
               entities array and replacing each "_____entity_____placeholder_____".
         Exit the loop if the new attribute value is empty.
         If the new attribute value is different from the old one we fixed something,
         so replace   the attr value in the doc with the new value.
      Next: attribute
   Next: element


A similar correction is done just after the user has entered an attribute value in the Attribute Inspector but before it has been inserted into the document. This other event runs just on that one attribute value and not the entire document. A function could probably be written for the two events to share, but to keep this demo simple to understand I have separated the two so that each macro contains all of its own code.

Demo Code:
This demo code is meant for use by a developer in charge of customizations for XMetaL installations (or at least someone maintaining installations). Before using this script please read the notes and comments in the MCR file (which also includes some legal stuff). This code is provided as a demo and should be treated as if it were completely untested. I have tested it as best I can, but it has not gone through our regular rigorous test process. You may also wish to adapt this code by altering the script logic itself as you may find that the functionality does not meet the exact needs of your end users, etc.

Please do not use this script without the permission of the people that maintain your XMetaL installation (if that isn't you). Although the possibility is low given the way I have coded this it could conflict with special customizations or scripts, 3rd party tools or plug-ins, a specific work-flow they have set up and wish you to follow, or any number of other things I cannot guess at. I would recommend telling them about your wish to have something like this and let them integrate it and test it for you.

Installation, Uninstallation:
See comments inside MCR file.

* demo_autoescapingCharsInAttributes.zip (4.29 KB - downloaded 282 times.)
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2450



WWW
« Reply #1 on: June 08, 2011, 05:22:37 AM »

I took a detailed look at the XML Recommendation yesterday and remembered that having the character > inside an attribute value is actually not illegal (it does mention something fairly vague about "you may escape..." or similar). When it appears in an attribute value XMetaL with not complain either (which is correct).

There is no harm (from an XML Recommendation standpoint and from an "authoring in XMetaL" standpoint) in escaping this character. It seems to suit my emotional need to 'balance things' (so it mirrors < when they both happen to be present) so I will leave it in this demo.

If you don't want this script to escape > it should be quite easy to remove those few lines that do that.

For the record, what the XML Recommendation says distills down to the following. It should sound simple when stated here in point form but it really is a lot of reading and not all information is located together nor is it a straightforward read (unless you are used to reading W3C specs and their interesting notations).

1. You must never have < in an attribute value and it must always be escaped as &lt;
2. You may choose to escape > as &gt; but you don't need to.
3. The & character must be escaped as &amp;
4. You may escape " or ' in an attribute value in order to use either " or ' as the character that denotes the attribute boundary (the character surrounding the attribute value).
5. You may use " inside an attribute as long as ' is used to denote the attribute boundary (the character surrounding the attribute value), or vice-versa.
6. If both " and ' are present in an attribute value then at least one of them will need to be escaped.
Logged
Pages: 1
Print
Jump to: