General XMetaL Discussion

XMetaL Community Forum General XMetaL Discussion seach result not complete for CCJK

  • bradley

    seach result not complete for CCJK

    Participants 4
    Replies 5
    Last Activity 9 years, 8 months ago

    I found this issue in testing a japanese webhelp output, I just copy some word in pages and search it, just no result. but it works for western languages. Then I tested Chinese version the same issue appeared. I am not sure why some words can not pass search function. Anyone know it's a bug or have a technical solution? If you need more materail I can provide the dita files and webhelp output.

    Version: xMetal7.0

    Reply

    Derek Read

    Reply to: seach result not complete for CCJK

    The code that generates the search feature for WebHelp cannot deal with any content that does not separate words using spaces. So languages such as Chinese and Japanese (if spaces are not used, which is most common) will not be searchable.

    When WebHelp is generated the content of each DITA topic is split into words using the spaces as a delimiter and a JavaScript array is built from these words. Without spaces that cannot be done and so they array will be empty (or mostly empty).

    There are no plans to attempt to improve this as we are not in the business of creating search engines (we primarily concentrate on the XML authoring portion of our software). So, I think a 3rd party solution would need to be found. You may wish to look into a fairly inexpensive product called [url=http://www.wrensoft.com/zoom/]ZoomSearch[/url]. One of our clients has implemented a solution based on it as they found the search feature in our WebHelp to be too simplistic: http://forums.xmetal.com/index.php/topic,1080.msg3420.html I believe all of their content is currently in English, so I don't know if ZoomSearch can support Chinese / Japanese. If not, then you will need to look elsewhere. I suspect that any software that can differentiate between words in Chinese / Japanese might need to take things to the next level and implement a look-up table / dictionary of words and possibly even some understanding of grammar in order to figure out where one word starts and another ends. This makes me suspect that ZoomSearch may not cut it and that you would need to look at some more advanced software.

    I'm sure there are lots of other products out there that specialize in search and are constantly improving what they do so ZoomSearch is not your only option. Depending on what your exact needs are (where you are deploying your WebHelp and how it is used) you might also look into using Google or some other search implementation depending on whether the WebHelp will be deployed for offline use or on a website.

    Reply

    Derek Read

    Reply to: seach result not complete for CCJK

    Information on the limitations for Chinese / Japanese for ZoomSearch are listed here:

    http://www.wrensoft.com/zoom/support/languages.html#asian

    Basically they have the same logic as we do for detecting words, although it does do a few additional things we don't do (but these are still simplistic checks).

    Reply

    Derek Read

    Reply to: seach result not complete for CCJK

    In order for our current implementation of WebHelp to support Chinese we would likely need to implement an extra step to break Chinese sentences into words, something along these lines: http://nlp.stanford.edu/software/segmenter.shtml

    Or one of various other solutions one might find searching http://www.google.com/search?q=中文分词

    As the code that generates our JavaScript array uses Java to do that, the first option has some chance of being implemented (as it is written in Java) but I don't see that happening anytime soon given current priorities.

    That leaves out Japanese which would require an alternative solution.

    I think the most robust solution would be to implement a full blown search solution. Google certainly seems to be able to handle any Chinese and Japanese you throw at it, so that is one very strong option.

    Reply

    bradley

    Reply to: seach result not complete for CCJK

    Derek Read

    Many thanks for your kind answer, I am interest in http://nlp.stanford.edu/software/segmenter.shtml

    I am not a dita wirter, or javascript deleloper, I am just a simple user of xMetal to localized clients' English version PDF and webhelp into CCJK by their English dita file. I am not sure how to use the Stanford Word Segmenter together with xMetal, can you give some instruction?

    Reply

    Derek Read

    Reply to: seach result not complete for CCJK

    This is not something you could do. It would take quite a bit of investigation by our development team to figure out if it could be used with the product, then further work to implement a solution that uses it (or something similar). I was merely pointing out that people are working on or have created some pieces to this puzzle that might help.

    If you need an immediate solution I think you should seriously consider something existing, such as integrating Google search or similar into your website.

    Reply

  • You must be logged in to reply to this topic.

Lost Your Password?

Products
Downloads
Support