In this case a “word” is defined by one or more characters followed or preceded by a white-space character and contained inside an element allowing text.

The basic logic is to load the entire document as a string, then:
1. Remove all tags, comments and processing instructions from the document (anything starting with < and ending with >.
2. Replace all contiguous sequences of non-white-space characters (no matter the length) with a single character (in the code the letter “a” is used but that is of no real importance).
3. Remove all white-space characters from the document.
4. Count up the number of remaining characters (the letter “a”) which now represent the original number of words in the document.

This is done using regular expressions in JScript because JScript string manipulation is far faster than loading the XML into an XML processor and parsing for text nodes, etc. You will see from this logic that it does not attempt to deal specifically with numbers, parenthesis or other punctuation, etc. I think once you get beyond this basic logic you need to start building in both language-specific smarts and perhaps business logic as well, plus personal preferences and there are probably as many different ways to do that as there are writers.

This is basically the same logic as used in the following demo (except that it has now been integrated into the new 7.0 “Cross-Files” feature): http://forums.xmetal.com/index.php/topic,28.0.html