Reply to: XML Document EncodingsSeptember 23, 2011 at 12:47 am
rogiez: I assume the encoding specified in your XML file is ASCII (the product supports the following for that encoding in case you may be using some other name: US-ASCII, ANSI_X3.4-1968, ANSI_X3.4-1986, ASCII, CP367, CSASCII, IBM367, ISO_646.IRV:1991, ISO646-US, ISO-IR-6, US, US-ASCII).
That is the only encoding supported by XMetaL Author that might cause the character (Unicode Name: “NO-BREAK SPACE”) to be saved as a numeric character entity in the XML source. The other three supported encodings (LATIN1, UTF-8 and UTF-16) all define that character and so when you save to those encodings the character is simply written out as itself, not as a numeric character entity reference.
I'm not sure how to explain this without a long explanation, so unfortunately…
If you use XMetaL Author to save a document using ASCII encoding and it contains characters not defined in the ASCII encoding specification (any character with a Unicode code point above 127 — the first 128 characters in Unicode are the same as ASCII but ASCII only defines 128 characters) you will see numeric character entity references for them (such as ) in the file if you open the document in an editor that does not render these entities as single characters (such as Notepad) but not in other software (such as most web browsers for example).
- If you open any document directly into Plain Text you will see something similar to opening the file in Notepad or other simple text editors.
- If you open such a document directly into Tags On or Normal view an “encoding import” is performed.
- When you switch from Plain Text view into one of these other two views the same “import” is performed. Essentially this means that switching from Plain Text view into Tags On or Normal view is identical to opening the file from disk directly into these two other views.
So, what do I mean by “encoding import”? In order to provide all the functionality necessary for editing (which includes scripting through the 1200+ APIs the product supports in Tags On and Normal views) XMetaL Author converts the XML source into a common encoding, and that is UTF-16. At this point, when documents are viewed in Tags On or Normal view, all characters in the document that have a corresponding glyph for them in the font specified for a particular element (this is done in the CSS file you have created for your customization, or in the case of DITA the CSS files we ship) it is used to render that character.
When you save a document to disk a similar “encoding export” is done that converts the internal representation of the XML source from UTF-16 into the encoding you have specified, and if you have specified ASCII then any character above Unicode code point 127 will appear as a character entity reference on disk. An “encoding export” is not done when you switch into Plain Text view from the other two views. That's probably the root of your issue. There is no way to alter this behaviour with a setting.
I suspect your work flow is like this:
1) Switch to Plain Text view for an already opened document or open a document directly into Plain Text view. The document's encoding is set to “ASCII” (or one of the variants for ASCII listed above).
2) The document contains numeric character entities �A0; or you enter the XML code for them manually.
3) You switch into either Tags On or Normal view. In this view the characters appear as “normal” spaces because the font being used renders them as such, and almost all fonts simply redirect references to   (ie: decimal character 160) to the regular space glyph (which is or decimal character 32). It is at this point that the “encoding import” has been performed.
4) You switch back to Plain Text view and see that these characters are represented using a single character that appears to be a normal space (under the covers it is actually Unicode code point 00A0 aka decimal 160). This is also due to the font (though in this case the font for Plain Text view is specified in Tools > Options and not via CSS).
5) You save the document. The “encoding export” into ASCII is done. You open the file with a text editor and see wherever a NO-BREAK SPACE was entered.
*XMetaL here may be: XMetaL Author Essential, XMetaL Author Enterprise or XMAX.