General XMetaL Discussion

XMetaL Community Forum General XMetaL Discussion XMetal Author Enterprise 7.0 ( Word Count Algorithm

  • Bradley Shoebottom

    XMetal Author Enterprise 7.0 ( Word Count Algorithm

    Participants 0
    Replies 1
    Last Activity 10 years, 8 months ago

    Can you tell me what words are being counted with the Word Count feature? First off, what elements are included, then what does word Coutn consider to be a word and not a word. I have being using a number of word counting tools and I want to be able to explain what was or was not counted.


    Derek Read

    Reply to: XMetal Author Enterprise 7.0 ( Word Count Algorithm

    In this case a “word” is defined by one or more characters followed or preceded by a white-space character and contained inside an element allowing text.

    The basic logic is to load the entire document as a string, then:
    1. Remove all tags, comments and processing instructions from the document (anything starting with < and ending with >.
    2. Replace all contiguous sequences of non-white-space characters (no matter the length) with a single character (in the code the letter “a” is used but that is of no real importance).
    3. Remove all white-space characters from the document.
    4. Count up the number of remaining characters (the letter “a”) which now represent the original number of words in the document.

    This is done using regular expressions in JScript because JScript string manipulation is far faster than loading the XML into an XML processor and parsing for text nodes, etc. You will see from this logic that it does not attempt to deal specifically with numbers, parenthesis or other punctuation, etc. I think once you get beyond this basic logic you need to start building in both language-specific smarts and perhaps business logic as well, plus personal preferences and there are probably as many different ways to do that as there are writers.

    This is basically the same logic as used in the following demo (except that it has now been integrated into the new 7.0 “Cross-Files” feature):,28.0.html


  • You must be logged in to reply to this topic.

Lost Your Password?