Home Forums General XMetaL Discussion Using xml:lang Values to Control Spell Checking Reply To: Using xml:lang Values to Control Spell Checking


Reply to: Using xml:lang Values to Control Spell Checking

First, this is a great step forward, and kudos to Justsystems for implementing it.  I have just a few points i'd like to raise:

1. RFC 5646 obsoletes RFC4646 whether you choose to follow it or not ;-).  On the other hand, it *doesn't* obsolete RFC4647 – thanks still current.  Actually, at the W3C we prefer to refer to these specs using the label BCP 47 (http://www.rfc-editor.org/rfc/bcp/bcp47.txt).  That covers both the language tag syntax spec (RFC 5646) and the matching spec (RFC 4647), and always refers to the most up-to-date version of each.

2. It doesn't seem quite strongly enough stated for my taste that, if you aren't dealing with legacy situations, you should use language tags as defined in BCP47 as it says in the XML spec.  By implication, this means that you should use the IANA Language Subtag Registry to look up subtags, not use ISO code lists.  This is important, because the IANA registry provides only one subtag per language, whereas the ISO codes sometimes offer two or three possibilities.  The IANA registry also goes *way* beyond the list of codes offered by ISO 639-1/2, due to the inclusion of around 7,000 ISO 639-3 codes. Using the codes as defined in BCP47 and the IANA registry increases the interoperability of the data.

3.  I would suggest that you label the right-mosts two columns in the table of languages above as BCP47 and Legacy, respectively.  (Note that the region codes are not actually part of ISO 639.)

4. I assume that WT_German accepts spellings for either Swiss German or (National) German (eg. it fails to recognise incorrect omisson of es-zet characters).  It may be worth adding a note to that effect to the WT_German line, since otherwise people may assume that de is sufficient for spell-checking normal German, when actually it isn't.

5. It may also be better to clarify the intended usage of zxx, which is *not* actually the same as xml:lang=””, although the effect for spell checking is the same (ie. skip the text).  See http://www.w3.org/International/questions/qa-no-language  (for further clarification, see http://rishida.net/utils/subtags/index.php?lookup=zxx+und&submit=Look+up).

Hope that helps,