Pages: 1
Print
Author Topic: Using xml:lang Values to Control Spell Checking  (Read 9732 times)
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2135



WWW
« on: December 11, 2009, 02:33:20 PM »

Product(s)
XMetaL Author Enterprise 6.0
(should also be supported in XMetaL Author Essential 6.0 when that product is released)

Oops
Instead of providing default settings in the xmetal60.ini file we decided to leave it up clients to decide on these values (that's actually a good thing), however, apparently we also missed documenting this as well.

Beware: this is going to be a long post ...

Request: I've spent a few hours writing this up and I think it covers most things at this point, but feedback is very welcome. 2009/12/15: I've made some changes directly to the original post after getting feedback from Richard Ishida (it should be easier to read all in one place rather than jumping back and forth between comments).

Background / Legacy Code
The values that XMetaL Author recognizes for spell checking default to legacy values that the product uses internally. These values were invented before xml:lang existed (actually before XML existed due to the fact that it originally came from another product). In most cases they do not match any of the RFC values most people would wish to use with xml:lang. Some of the more common ones happen to match (like EN) but this is just by chance and quite a few others do not.

Standard xml:lang Language Codes
The W3C XML Recommendation defines basic rules for xml:lang (how it must be declared in your DTD or Schema). Also related to this are the standards ISO-639-1, ISO-639-2, RFC4646, and RFC4647 and RFC5646 (the last one actually makes 4646 obsolete). Also related is BCP47 which is  the reference preferred by the W3C. BCP47 is a concatenation of several RFCs and though long basically puts everything in one place.

Basically, ISO-639-1 consists of two letter language codes (that many people may recognize) and ISO-639-2 uses three letter codes. The RFCs describe how the full code should be constructed, and codes may include language, region, script, 'variants' and other things, including rules on letter casing and separator characters like "-". If you need to read one document please read BCP47.

We've tried to design our spell checking support for xml:lang to be as flexible as possible. This means you may opt to specify any "standard" value or you may use other values (perhaps from an industry or other standard you may wish to follow), and you may specify multiple values in the INI file for a particular spell checking language (keeping in mind that the value for the xml:lang attribute in the XML source itself can only have one value and will therefore either match one INI setting or none).

This means you should decide which values you will use based on all of your requirements, from external tools, XSLT transforms, specifications, etc, first. Then configure XMetaL Author's spell checker to understand the values you are working with. This is the approach I would recommend: let your requirements drive the values you use, but whenever possible stick to the most current W3C and associated standards.

Table of INI Variables Supported by the Spell Checker for xml:lang
Following is the complete list of currently supported spell checking languages. It includes the INI variable name (prefixed with "WT") that controls the values you wish to have recognized for the xml:lang attribute, the English name for the language, and the corresponding ISO-639-1 and ISO-639-2 value(s) that I think would most commonly be used for that language by most people working with xml:lang.

The values listed here for ISO639-1 and ISO-639-2 are suggestions only, though they were taken directly from those specs. Be sure you consult with other people in your organization before deciding on exact values as other tools and processes may have specific requirements.

INI Variable NameEnglish Name for the LanguageISO-639-1          ISO-639-2
WT_AFRIKAANSAfrikaansafafr
WT_CATALANCatalancacat
WT_CZECHCzechcsces, cze
Note: Both codes are considered synonyms.
WT_DANISHDanishdadan
WT_DUTCHDutchnldut, nld
Note: Both codes are considered synonyms.
WT_ENGLISHEnglisheneng
WT_FRENCHFrenchfrfra, fre
Note: Both codes are considered synonyms.
WT_GALICIANGalacianglglg
WT_GERMANGermandeger, deu
Note: Both codes are considered synonyms.
WT_GREEKGreekelgre, ell
Note(1): Both codes are considered synonyms.
Note(2): Ancient Greek (before the year 1454) is "grc" and is not supported by the spell checker.
WT_ISLANDICIslandic (Icelandic)isice, isl
Note: Both codes are considered synonyms.
WT_ITALIANItalianitita
WT_NORWEGIANNorwegiannonor
WT_PORTUGUESEPortugueseptpor
WT_RUSSIANRussianrurus
WT_SLOVAKSlovakskslo, slk
Note: Both codes are considered synonyms.
WT_SESOTHOSesotho (Sotho, South Sotho)stsot
WT_SPANISHSpanishesspa
WT_SWEDISHSwedishsvswe
WT_SETSWANASetswana (Tswana)tntsn
WT_TURKISHTurkishtrtur
WT_XHOSAXhosaxhxho
WT_ZULUZuluzuzul
WT_ENGLISH_AUSTRALIANAustralian Englishen-aueng-AU
WT_ENGLISH_CANADIANCanadian Englishen-caeng-CA
WT_ENGLISH_BRITISHBritish Englishen-gbeng-GB
WT_ENGLISH_USUnited States Englishen-useng-US
WT_FRENCH_CANADIANCanadian Frenchfr-cafra-CA, fre-CA
WT_GERMAN_SWISSSwiss Germande-chdeu-CH, ger-CH
WT_PORTUGUESE_BRASILBrazilian Portuguesept-brpor-BR
WT_SPANISH_AMERICANAmerican Spanishes-usspa-US
WT_NO_LINGUISTIC_CONTENTDo Not Spell Check (treat content as a non-spellcheckable language)zxx

INI Settings Examples
The values listed below for ISO639-1 (two letter codes) and ISO-639-2 (three letter codes) are suggestions only, though they were taken directly from those specs. Be sure you consult with other people in your organization before deciding on exact values as other tools and processes may have specific requirements.

If the xml:lang code (the value portion of the INI variable) does not include the particular value you need just replace the existing one, or append your additional value to the end after adding a semicolon.

In the dialects section, two letter country codes are appended to the language code to make up "dialects" which are specific regional variances in languages, however (again) these values are here as examples only and it is up to you to decide what is correct for your organization's purposes.

#SPELL CHECKER LANGUAGES FOR xml:lang ATTRIBUTE VALUES
WT_AFRIKAANS=af;afr
WT_CATALAN=ca;cat
WT_CZECH=cs;ces;cze
WT_DANISH=da;dan
WT_DUTCH=nl;dut;nld
WT_ENGLISH=en;eng
WT_FRENCH=fr;fra;fre
WT_GALICIAN=gl;glg
WT_GERMAN=de;deu;ger
WT_GREEK=el;ell;gre
WT_ISLANDIC=is;ice;isl
WT_ITALIAN=it;ita
WT_NORWEGIAN=no;nor
WT_PORTUGUESE=pt;por
WT_RUSSIAN=ru;rus
WT_SLOVAK=sk;slk;slo
WT_SESOTHO=st;sot
WT_SPANISH=es;spa
WT_SWEDISH=sv;swe
WT_SETSWANA=tn;tsn
WT_TURKISH=tr;tur
WT_XHOSA=xh;xho
WT_ZULU=zu;zul
WT_NO_LINGUISTIC_CONTENT=zxx

#SPELL CHECKER DIALECTS FOR xml:lang ATTRIBUTE VALUES
WT_ENGLISH_AUSTRALIAN=en-AU;eng-AU
WT_ENGLISH_CANADIAN=en-CA;eng-CA
WT_ENGLISH_BRITISH=en-GB;eng-GB
WT_ENGLISH_US=en-US;eng-US
WT_FRENCH_CANADIAN=fr-CA;fra-CA;fre-CA
WT_GERMAN_SWISS=de-CH;deu-CH;ger-CH
WT_PORTUGUESE_BR=pt-BR;por-BR
WT_SPANISH_AMERICAN=es-US;spa-US


Note(1): If your xml:lang value's language code is not listed in the INI file then the fallback functionality of the spell checker is to use the default language as selected in the spell checker's Options dialog (set from within the main spell checker dialog, launched via F7).

Note(2): Letter casing (uppercase vs lowercase) is ignored with regard to xml:lang (ie: "EN-US", "en-us" and "en-US" are considered equivalent).

Note(3): zxx has been recommended to represent text that should not be interpreted as a standard human language. When it is used as set above XMetaL will skip over any element with xml:lang set to this value and not spell check it at all. This is useful for sections of programming code or perhaps other uses. As with all the other values here you may configure WT_NO_LINGUISTIC_CONTENT to whatever you like if "zxx" does not meet your needs (provided the value meets the xml:lang attribute value rules in the W3C XML Recommendation).

Note(4): Regardless of any settings in the INI file, when an xml:lang attribute value is set to be an empty string value, such as xml:lang="" that element will be skipped and not spell checked. This behavior is essentially equivalent to #3 above from the point of view of the spell checker (though it does have a distinct difference in meaning which is actually "no language" as opposed to "non human language"). However, XMetaL Author purposely makes it difficult for users to set an attribute value to be an empty string using the Attribute Inspector, so to do this you must either have implemented special code in your XMetaL Author customization to allow users to accomplish this, or you must set the value using PlainText view.

Note(5): The values you use in the INI file should be unique to each setting. Meaning that if you specify the same value in more than one INI variable unexpected behavior will occur. Please don't ask what the behavior might be, just avoid doing this.

Note(6): Do not specify the same INI variable multiple times. This should not be an issue as far as XMetaL is concerned, but you may not see the results you expect in this case. Again, please don't ask what the behavior might be, just avoid doing this.

The Shipped xmetal60.ini File
The following setting is included with the xmetal60.ini file:
WT_ENGLISH_BRITISH=EN-UK;EN-GB
This can be safely removed if desired. It should be removed if you will be specifying your own WT_ENGLISH_BRITISH settings elsewhere in the INI file to be sure there are no conflicts. Note however, that the default internal (legacy) code of "EN-UK" will be recognized if this variable is not present and set to another value.

How the Auto-Switching Works
The spell checker, whether you use the spell checking dialog (F7) or use the new 6.0 release's "check spelling while typing" option (see Tools > Options) aka: "red squiggles", XMetaL Author switches to the language specified in the xml:lang attribute when entering an element containing PCDATA (text).

If that element in turn has a child element with a different xml:value the spell checker changes to that corresponding child element's value. When no xml:lang value is set for an element it inherits the value of the parent element or nearest ancestor (standard xml:lang rules).

If such an element has no ancestors with an xml:lang value set then the default value for spell checking (as set in the spell checker's Options dialog) is used.

So, assuming you have all the settings above in your INI file and your default language is set to "English-US" in the spell checker's Options dialog, when entering a given element with one of the following xml:lang values the spell checker should do the following:
  • xml:lang is not set --> XMetaL begins walking up the document tree checking for parent elements with an xml:lang value set (and uses the nearest). If it fails to find any then the value as set in the spell checker Options dialog (in this case "English-US") is used.
  • xml:lang="en" --> All English spellings (US, UK, CA and AU) are considered correct (both "colour" and "color" are considered correct).
  • xml:lang = "en-US" --> English-US is used (ie: "color" is correct, "colour" is incorrect)
  • xml:lang = "en-CA" --> English-CA is used (ie: "colour" is correct, "color" is incorrect)
  • xml:lang="" --> no spell checking is performed (element is skipped)
  • xml:lang="zxx" --> no spell checking is performed (element is skipped)

External References
« Last Edit: March 06, 2014, 05:24:23 PM by Derek Read » Logged
dcramer
Member

Posts: 120


« Reply #1 on: December 11, 2009, 02:49:12 PM »

This sounds very cool. Is there a way (ideally without adding a phony xml:lang on the element) to specify a list of elements that should, by default, be skipped in spell checking? That way you could configure it not to spell check code listings and code-like things. This isn't as urgent since it now has the red squiggly style spell checking which is less obtrusive, but still would be nice.

Thanks,
David
« Last Edit: December 11, 2009, 03:06:17 PM by dcramer » Logged

David Cramer
Technical Writer
Motive, an Alcatel-Lucent Company
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2135



WWW
« Reply #2 on: December 11, 2009, 03:08:54 PM »

dcramer:

There are new APIs in 6.0 that should allow you to do this. When I have some time to properly describe that I will create a new forum post that covers this, unless we release XMetaL Developer 6.0 and associated docs before that.
Logged
rishida
Member

Posts: 1


« Reply #3 on: December 15, 2009, 09:39:12 AM »

First, this is a great step forward, and kudos to Justsystems for implementing it.  I have just a few points i'd like to raise:

1. RFC 5646 obsoletes RFC4646 whether you choose to follow it or not ;-).  On the other hand, it *doesn't* obsolete RFC4647 - thanks still current.  Actually, at the W3C we prefer to refer to these specs using the label BCP 47 (http://www.rfc-editor.org/rfc/bcp/bcp47.txt).  That covers both the language tag syntax spec (RFC 5646) and the matching spec (RFC 4647), and always refers to the most up-to-date version of each.

2. It doesn't seem quite strongly enough stated for my taste that, if you aren't dealing with legacy situations, you should use language tags as defined in BCP47 as it says in the XML spec.  By implication, this means that you should use the IANA Language Subtag Registry to look up subtags, not use ISO code lists.  This is important, because the IANA registry provides only one subtag per language, whereas the ISO codes sometimes offer two or three possibilities.  The IANA registry also goes *way* beyond the list of codes offered by ISO 639-1/2, due to the inclusion of around 7,000 ISO 639-3 codes. Using the codes as defined in BCP47 and the IANA registry increases the interoperability of the data.

3.  I would suggest that you label the right-mosts two columns in the table of languages above as BCP47 and Legacy, respectively.  (Note that the region codes are not actually part of ISO 639.)

4. I assume that WT_German accepts spellings for either Swiss German or (National) German (eg. it fails to recognise incorrect omisson of es-zet characters).  It may be worth adding a note to that effect to the WT_German line, since otherwise people may assume that de is sufficient for spell-checking normal German, when actually it isn't.

5. It may also be better to clarify the intended usage of zxx, which is *not* actually the same as xml:lang="", although the effect for spell checking is the same (ie. skip the text).  See http://www.w3.org/International/questions/qa-no-language  (for further clarification, see http://rishida.net/utils/subtags/index.php?lookup=zxx+und&submit=Look+up).

Hope that helps,
RI
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2135



WWW
« Reply #4 on: December 15, 2009, 02:44:02 PM »

@rishida

Thanks for the great feedback.

Because our software is very often only one piece of a larger installation (which often includes CMS systems, work flow management systems, translation memory and management systems, post processing systems, and systems that perform transformations to various file formats, and perhaps other things) the goal here was to show how to enable the XMetaL spell checker to take advantage of xml:lang values to support spell checking.

I'll leave education in usage of xml:lang up to the experts, and there is no shortage of information on this topic, including your posts here, the W3C XML Recommendation and the specs it links to, and many books on XML including my favourite "Charles Goldfarb's XML Handbook" (Charles Goldfarb and Paul Prescod 4th Edition: ISBN 0-13-065198-2; 5th Edition: 0-13-049765-7).

Regarding specific points...

1. The main point here (which you understood) was that, for any given client using XMetaL, various factors come into play that may make it difficult to stick with the latest specs (and ultimately, the XML source is often for internal use only with files to be consumed externally being transformations based on this internal format). I agree though -- whenever possible it makes sense to follow current specs.

2/3. I'll have a look at these suggestions and make some changes.

4. 'German National' is a special case. The first time German National is used you are prompted to select from one of three options (all XMetaL versions 4.x up to and including 6.0):
  • New spelling (Fluss)
  • Old spelling (Fluß)
  • Allow both

Swiss German allows the new spelling method only.

5. This is a good point as well. Approaching it strictly from an xml:lang usage point of view there really should be a difference (when used correctly), but (as you say) from the point of view of our spell checker there isn't any difference. The net result for the spell checker will be to skip over these elements.
« Last Edit: December 15, 2009, 07:57:47 PM by Derek Read » Logged
dcramer
Member

Posts: 120


« Reply #5 on: May 21, 2010, 12:19:49 PM »

Hi Derek,
Could you point me to the new APIs in 6.0 that would allow me to skip spell checking based on element attributes?

Thanks,
David
Logged

David Cramer
Technical Writer
Motive, an Alcatel-Lucent Company
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2135



WWW
« Reply #6 on: May 25, 2010, 07:15:57 PM »

The APIs are not documented in the Programmer's Guide (6.0) yet, but here's an example you can try with the Journalist demo.
It is still possible these APIs may change a little bit in the future (part of the reason we haven't documented them yet).

Code:
<MACRO name="On_Document_Open_Complete" lang="JScript" hide="true"><![CDATA[

//*********************************************************
// Disable Spell Checking for Certain Nodes
//*********************************************************

function spellService() {
//create the spell checker service
}

spellService.prototype.shouldSpellCheck = function(node) {
//spell check every node...
var spellCheck = 1;

//...unless it triggers one of the following tests

//node is an element
if (node.nodeType == 1) {
//element name = <ProgramListing>
if (node.nodeName == "ProgramListing") {
//do not spell check
spellCheck = 0;
}
}

//node's parent's attribute called "Style" has a value equal to "Bullet"
if (node.parentNode.getAttribute("Style") == "Bullet") {
//do not spell check
spellCheck = 0;
}

return spellCheck;
}

var spServ = new spellService();

ActiveDocument.SetSpellCheckerService(spServ);

Note that because the journalist.mcr already has an "On_Document_Open_Complete" event macro you will want to incorporate this into that same section of the MCR file.

With that in place try the following XML file that uses the journalist.dtd:
Code:
<?xml version="1.0"?>
<!DOCTYPE Article PUBLIC "-//SoftQuad Software//DTD Journalist v2.0 20000501//EN" "../../../Program%20Files/XMetaL%206.0/Author/Rules/journalist.dtd">
<Article>
<Title>spellchecked</Title>
<Sect1>
<Title>spellchecked</Title>
<Para>spellchecked</Para>
<ItemizedList Style="Bullet">
<ListItem>
<Para>spellchecked</Para>notspellchecked <Para>spellchecked</Para>
</ListItem>
</ItemizedList>
<ItemizedList Style="Simple">
<ListItem>spellchecked</ListItem>
</ItemizedList>
<ProgramListing>notspellchecked</ProgramListing>
</Sect1>
</Article>

This example is somewhat contrived because the Journalist demo only has an Id attribute for most elements that allow PCDATA. So, in this example a node that is directly inside an element with the attribute "Style" set to "Bullet" is skipped (ie: <ListItem>), while child elements of that node are not skipped (they are spell checked). You will need to design your logic based on your own elements, attributes and their relationships of course. Hopefully you want to just skip entire elements, or have implemented something similar to xml:lang, as that should make the logic fairly straightforward. Ideally the amount of code in here should be kept to a minimum to make things run as fast as possible.
Logged
dcramer
Member

Posts: 120


« Reply #7 on: May 25, 2010, 08:47:32 PM »

Excellent. This is exactly what I've wanted. Yes, I'll want to skip entire elements based on element names and in some cases, attribute values, (e.g. elements explicitly flagged as localize="no" or turning spell checking on for elements explicitly flagged localize="yes" that would otherwise be skipped, such as <programlisting localize="yes">).

I'll give it a shot and let you know how it goes.

Thanks,
David
Logged

David Cramer
Technical Writer
Motive, an Alcatel-Lucent Company
dcramer
Member

Posts: 120


« Reply #8 on: May 26, 2010, 06:35:47 AM »

It works beautifully. Moreover, the performance gained by freeing XMetaL from the need to draw lines under so many words more than offsets any performance lost by doing the test. When I first started using XMetaL 6.0 with some real documents, the performance was noticeably worse than before unless I turned off the interactive spell checking. Now the performance is back to normal even with interactive spell checking on.

This will be great too in that it rewards the writer for doing semantic markup and l10n prep. We use a localize attribute to indicate whether the contents of an element should be translated. We programmatically add the localize attribute to certain elements, but the writer can override the default behavior by manually adding localize="yes" or localize="no" to an element.

Thanks!
David
Logged

David Cramer
Technical Writer
Motive, an Alcatel-Lucent Company
MrPaul
Member

Posts: 29


« Reply #9 on: February 03, 2014, 10:45:15 AM »

I have a few questions.

1. Is all of the information still accurate for XMAX v7?

2. We've noticed the following behavior when playing with the xml:lang attribute:

xml:lang="en" accepts "color" and "colour" as valid words.
xml:lang="en-US" accepts "color" and stops on "colour" (expected)
xml:lang="en-CA" does not seem to work. It doesn't even stop on "sfsdfsdfsf".
When the xml:lang attribute is omitted entirely, it accepts "colour" but stops on "color" (is this en-CA?)

xml:lang="fr-CA" does not work either. All words are ignored.
xml:lang="fr" seems to work for the french dictionary, but is it Canadian French?

We need to be able to specify en-CA and fr-CA languages for the spell checker. How can we accomplish this?
(Also, is there an .ini file when using XMAX? I could not find one.)

3. Is there an API function that we can call instead of setting the xml:lang attribute (since our current DTD does not allow it)?

Thanks.
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2135



WWW
« Reply #10 on: February 04, 2014, 04:50:49 PM »

The information I originally posted in this message is not accurate for XMAX.

I'm checking with our dev team to see what can be offered for your situation, but the behaviour you are seeing is expected primarily because this configuration is done with INI settings (for XMetaL Author) but XMAX does not have an INI file.

Some APIs have been added to recent releases of XMAX to allow you to configure it to support some features that are set using INI values in XMetaL Author, but these do not cover the spell checking INI settings.

Essentially, the reason you are seeing the behaviour in XMAX is due to the fact that internally Writing Tools was created to support made-up language codes that mostly do not match standard xml:lang codes (Writing Tools predates xml:lang by a decade or so). When XMetaL Author communicates with Writing Tools it uses these odd codes, but the solution for exposing the correct codes to the outside (via xml:lang) requires correct values to be set in the INI file.

The codes that Writing Tools supports internally do not include "en-ca". The made-up codes include "CE" (presumably for Canadian English) and "en-oz" (Australian English). These are adjusted in the default XMetaL Author INI file to "en-ca" and . "fr-ca" is also not there and instead Writing Tools recognizes "CF" (not "fr-cf" just "CF"). Again, this is adjusted in the INI file for XMetaL Author.

What they probably should have done was to hard code more standard xml:lang values into XMetaL Author itself as defaults (which would then be inherited by the XMAX code). The INI settings would still allow people to set their own values if desired but then at least the defaults would be normal xml:lang values.

At this point I'm not sure what we're going to do, but I suspect we should do some cleanup in addition to adding specific support for this to XMAX (if still required after this cleanup).

Workaround (?)

I can think of a fairly elaborate workaround that might work now, but I'm not sure if you want to go to these lengths (this is a pretty hacky workaround). It would be possible to add xml:lang to your document's DTD without modifying the actual DTD using the event On_DTD_Open_Complete. The method is addAttribute() and depending on how you want to do it you might even add the xml:lang as a fixed attribute with a set value, probably to the root element (assuming your docs contain one language). If the value was set to "CE" or "CF" I think that might work. Setting it to be a "fixed" attribute should mean that you do not need to change the markup. If it were set to "implied" then you'd need to change the actual XML markup (then probably undo that before saving so that the document remains valid for the rest of your XML software chain).

« Last Edit: February 04, 2014, 05:09:56 PM by Derek Read » Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2135



WWW
« Reply #11 on: February 04, 2014, 04:58:36 PM »

For anyone that wants to use proper xml:lang values with XMAX today (in the case where the current value for a particular language is non-standard) you can launch the spell checker then add a language that uses the xml:lang value you want via the Options > Language. Clicking the Add button lets you add a new language code, which you can then choose and select a language file to use with.

These codes are limited to two letter codes only.

Pretty sure this won't help mrpaul as the additional requirement there is that you don't want to modify the DTD, and the current DTD doesn't have xml:lang, so the scripting solution is probably easiest in that case.
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2135



WWW
« Reply #12 on: February 04, 2014, 06:24:55 PM »

I'm not yet getting what I expected in my results when adding xml:lang programmatically via On_DTD_Open_Complete so that workaround might not be possible. Once I figure out for sure I'll post something here.
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2135



WWW
« Reply #13 on: February 04, 2014, 06:43:14 PM »

OK, so here's the deal. The "internal" values for Canadian English and Canadian French aren't what I thought they were (my long post a couple of posts ago is still basically correct except for these two values). They are "en-ce" and "fr-cf".

So, if you want to use the scripting workaround mentioned previously and...

...the document you are loading into XMAX is entirely Canadian French then run this script in On_DTD_Open_Complete:

Code:
// XMetaL Script Language JSCRIPT:
var docType = ActiveDocument.doctype;
docType.addAttribute("yourDocumentRootElement","xml:lang","xml:lang",0,3,"fr-cf");

...the document you are loading into XMAX is entirely Canadian English then run this script in On_DTD_Open_Complete:
Code:
// XMetaL Script Language JSCRIPT:
var docType = ActiveDocument.doctype;
docType.addAttribute("yourDocumentRootElement","xml:lang","xml:lang",0,3,"en-ce");
Logged
MrPaul
Member

Posts: 29


« Reply #14 on: March 05, 2014, 08:08:26 AM »

Derek,

Some of our documents have mixed languages. Mainly, I need to be able to switch between "en-ce" and "fr-cf" depending on the current position of the cursor (we have a custom spell checking module that loops through the document using a range and calls XMAX spell checking functionality). In our XML, the different language sections have a lang=1 or lang=2 attribute that I can use to determine which spell checker dictionary to use.

My question is: in which XMAX call does the value of xml:lang get read to determine if the word is spelled correctly or not? Is it inside the call to ActiveDocument.IsSpellingCorrect()? Is it a bad idea to continuously change the value of the document's doctype on the fly?

I tried testing this but using the example you provided, I am getting an "attribute exists" exception from XMAX the 2nd time it tries to set the value (since the code that checks if the word is spelled correctly loops as the range changes its selection).

Also, the first time it's added, when I view the ActiveDocument.Document.xml value, I cannot see the set xml:lang attribute. Is that normal?

Can I call docType.addAttribute with an element that isn't the root (in the 1st param)? How should I go about this?

Thank you.

EDIT: Just to add to this, if I check docType.hasAttribute("Root", "xml:lang") before adding it, it will properly return false the 1st time, then true the 2nd time as to not re-add it. However, when I get to a section in the XML that is in a different language, I'd like to update this value. What's weird is that docType.attributes is empty and there doesn't seem to be a "updateAttribute" or other similar function. And again, if I re-call docType.addAttribute, it crashes with the error: "attribute exists.". I tried playing with the intDeclType param in addAttribute (I noticed you set this to sqDTFIXED or 3 in your example) but to no avail.
The good news is that when the xml:lang attribute is set to the value "en-ce" or "fr-cf", the proper dictionaries are used! I just need to be able to update this dynamically.
« Last Edit: March 05, 2014, 10:03:28 AM by MrPaul » Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2135



WWW
« Reply #15 on: March 06, 2014, 05:23:37 PM »

The DTD should be extended using addAttribute() in the event On_DTD_Open_Complete. It is best to not try to continuously modify the DTD. I can't see why that would be necessary, but perhaps I misunderstand.

You can specify any element in the addAttribute() API and that element will then allow the new attribute. I merely suggested the root element because (given what I know about your usage of XMAX) my assumption was that your documents would only use one language. You will need to call it for every element you wish to allow the attribute on.

If you need to switch the value for these elements or allow different values (for example: you want <p> to support xml:lang but the value for that attribute needs to be changeable) then you should not define the attribute as #FIXED (as in my previous example). Most likely you would want to create it as a CDATA #IMPLIED attribute. Here's an example:

//XMetaL Script Language JScript:
var elements = ["myElem1","myElem2","myElem3"];
for(i=0;i<elements.length;i++) {
  var docType = ActiveDocument.doctype;
  docType.addAttribute(elements, "xml:lang", "xml:lang", 0, 0);
}

You will then need to have additional code that walks through the document to figure out which element should be CDN English and which should be CDN French and then set xml:lang appropriately. I'm sure how you are currently keeping track of the language but it sounds like you have other attributes for that. This could be done in On_Document_Open_Complete or anytime after the document is open. It could be in an MCR file (if you want to use XMAX events) or you might include it in your hosting application's own events.

Note: You should not repeatedly call addAttribute on the same element for a given DocType. You should only run this API to add the attribute definition to any element for a given DocType (DTD) once. I think this is normally pretty easy to manage (as in my example above).

Please keep in mind that although we're using standard and well-tested APIs here we are using them to work around limitations in the spell checking engine, so their usage for this particular context is not tested.
Logged
Pages: 1
Print
Jump to: