General XMetaL Discussion
bradley January 9, 2013 at 9:21 am
seach result not complete for CCJKJanuary 9, 2013 at 9:21 amParticipants 4Replies 5Last Activity 10 years ago
I found this issue in testing a japanese webhelp output, I just copy some word in pages and search it, just no result. but it works for western languages. Then I tested Chinese version the same issue appeared. I am not sure why some words can not pass search function. Anyone know it's a bug or have a technical solution? If you need more materail I can provide the dita files and webhelp output.
Version: xMetal7.0Derek Read January 9, 2013 at 10:01 pm
Reply to: seach result not complete for CCJKJanuary 9, 2013 at 10:01 pm
The code that generates the search feature for WebHelp cannot deal with any content that does not separate words using spaces. So languages such as Chinese and Japanese (if spaces are not used, which is most common) will not be searchable.
There are no plans to attempt to improve this as we are not in the business of creating search engines (we primarily concentrate on the XML authoring portion of our software). So, I think a 3rd party solution would need to be found. You may wish to look into a fairly inexpensive product called [url=http://www.wrensoft.com/zoom/]ZoomSearch[/url]. One of our clients has implemented a solution based on it as they found the search feature in our WebHelp to be too simplistic: http://forums.xmetal.com/index.php/topic,1080.msg3420.html I believe all of their content is currently in English, so I don't know if ZoomSearch can support Chinese / Japanese. If not, then you will need to look elsewhere. I suspect that any software that can differentiate between words in Chinese / Japanese might need to take things to the next level and implement a look-up table / dictionary of words and possibly even some understanding of grammar in order to figure out where one word starts and another ends. This makes me suspect that ZoomSearch may not cut it and that you would need to look at some more advanced software.
I'm sure there are lots of other products out there that specialize in search and are constantly improving what they do so ZoomSearch is not your only option. Depending on what your exact needs are (where you are deploying your WebHelp and how it is used) you might also look into using Google or some other search implementation depending on whether the WebHelp will be deployed for offline use or on a website.Derek Read January 9, 2013 at 10:05 pm
Reply to: seach result not complete for CCJKJanuary 9, 2013 at 10:05 pm
Information on the limitations for Chinese / Japanese for ZoomSearch are listed here:
Basically they have the same logic as we do for detecting words, although it does do a few additional things we don't do (but these are still simplistic checks).Derek Read January 10, 2013 at 12:09 am
Reply to: seach result not complete for CCJKJanuary 10, 2013 at 12:09 am
In order for our current implementation of WebHelp to support Chinese we would likely need to implement an extra step to break Chinese sentences into words, something along these lines: http://nlp.stanford.edu/software/segmenter.shtml
Or one of various other solutions one might find searching http://www.google.com/search?q=中文分词
That leaves out Japanese which would require an alternative solution.
I think the most robust solution would be to implement a full blown search solution. Google certainly seems to be able to handle any Chinese and Japanese you throw at it, so that is one very strong option.bradley January 11, 2013 at 2:13 am
Reply to: seach result not complete for CCJKJanuary 11, 2013 at 2:13 am
Many thanks for your kind answer, I am interest in http://nlp.stanford.edu/software/segmenter.shtml
Reply to: seach result not complete for CCJKJanuary 11, 2013 at 2:21 am
This is not something you could do. It would take quite a bit of investigation by our development team to figure out if it could be used with the product, then further work to implement a solution that uses it (or something similar). I was merely pointing out that people are working on or have created some pieces to this puzzle that might help.
If you need an immediate solution I think you should seriously consider something existing, such as integrating Google search or similar into your website.
- You must be logged in to reply to this topic.