Pages: 1
Print
Author Topic: DITA: Troubleshooting Japanese and Simplified Chinese PDF output  (Read 9886 times)
Su-Laine Yeo
Solutions Consultant
Member

Posts: 260


« on: August 10, 2010, 01:25:42 PM »

Products: XMetaL Author Enterprise 4.6 and later, or DITA Open Toolkit 1.4.1 with RenderX XEP.

By default, the DITA Open Toolkit (DITA OT) includes configuration files to produce HTML output for 47 locales, and 7 locales for PDF output. As of DITA OT 1.5, the 7 preconfigured locales for PDF output are:

- English (en-us)
- French (fr-fr)
- German (de-de)
- Italian (it-it)
- Japanese (ja-jp)
- Simplified Chinese (zh-cn)
- Spanish (es-es)

You can configure the DITA OT to work with additional locales, such as Traditional Chinese and Russian, however this article will only cover how to work with the preconfigured locales.

HTML output from the DITA OT usually appears correctly in any language. When creating PDF output, however, Simplified Chinese and Japanese characters often do not appear at all. This article explains how to make Simplified Chinese and Japanese characters appear. It is assumed that you are using RenderX XEP as your XSL-FO processor. RenderX XEP is installed automatically with XMetaL Author Enterprise Edition.

Background
When creating a PDF file, you must indicate what font(s) to use for displaying text. If the font is not available on your system, the text will appear either incorrectly or not at all. Most fonts that are widely used for Western languages cannot display Chinese or Japanese characters, and the fonts which are aesthetically optimal for Western languages are not optimal for Asian ones.

You can configure the DITA Open Toolkit to use any font for any language. By default, it is preconfigured to use a font called "Adobe Song Std Light" for Simplified Chinese, and a font called "KozMinProVI" for Japanese. Both of these fonts are free, however if you don't regularly read Japanese or Chinese documents on your computer, you probably don't have them installed. To create PDF files which use these fonts, you must do the following:

1) Download and install the fonts.
2) Configure RenderX XEP so that it can find the font files.
3) Apply a patch to the DITA OT so that it will use the correct fonts in headings.

Prerequisites

Downloading and Installing the Fonts
1) Download and install this: Adobe Reader 9 Font Packs - Chinese Simplified
2) Download and install this: Adobe Reader 9 Font Packs - Japanese
3) Open your Windows Fonts folder. This is typically C:\WINDOWS\Fonts.
4) Open the folder in which the font packs have been installed. This is typically C:\Program Files\Adobe\Reader 9.0\Resource\CIDFont. You should see at least four files, including  AdobeSongStd-Light.otf and KozMinPr6N-Regular.otf. Copy those two files to your Windows Fonts folder.

Configuring RenderX XEP
1) Locate your xep.xml file. For most installations of XMetaL Author Enterprise, it is in the following folder: C:\Documents and Settings\<username>\Application Data\SoftQuad\XMetaL Shared\renderx

2) Do ONE of the following:
- Download and unzip the file that is attached to this article. Back up the existing xep.xml file and replace it with the copy of the xep.xml file that you just downloaded.
- Open the xep.xml file in a text editor. Copy the following lines of code and paste it within the <fonts> element of the xep.xml file, then save the file:

Code:
<!-- Simplified Chinese and Japanese fonts -->

<font-group xml:base="file:/C:/Windows/Fonts/">
      <font-family name="AdobeSongStd-Light">
        <font><font-data otf="AdobeSongStd-Light.otf"/></font>
</font-family>
      <font-family name="KozMinProVI-Regular">
        <font><font-data otf="KozMinPr6N-Regular.otf"/></font>
</font-family>
</font-group>


Patching the DITA OT to use custom fonts for headings
There is a known issue in one of the default stylesheet files which causes headings to appear in Helvetica rather than the correct font. To fix this issue, see the post, "Patch for making headings use the correct font in PDF output".

Adding an "xml:lang" attribute to your content

The "xml:lang" attribute indicates what language your document is in, so that the publishing system can display it appropriately. The XMetaL Enhanced PDF output format uses the xml:lang attribute that is set at the root of the primary map file. To set the xml:lang attribute:

1) Open the DITA map file in the XMetaL map editor pane.
2) Click the map title to select it.
3) Click the Properties button.
4) Click the Other Attributes tab.
5) In the Language field, type either ja-jp for Japanese, or zh-cn for Simplified Chinese.
6) Click OK.

To generate output, click File > Generate Output for DITA Map. Select "XMetaL Enhanced PDF for RenderX XEP" as the deliverable type. After output is generated, you should now see your Japanese or Simplified Chinese content appearing in the PDF.

For testing, you can download a set of sample DITA files in Japanese and Simplified Chinese, which is attached to this article.

Legal:
* Licensed Materials - Property of JustSystems, Canada, Inc.
*
* (c) Copyright JustSystems Canada, Inc. 2010
* All rights reserved.
*
*-------------------------------------------------------------------
* The sample contained herein is provided to you "AS IS".
*
* It is furnished by JustSystems Corporation as a simple example and has not been
* thoroughly tested under all conditions. JustSystems Canada, Inc., therefore, cannot
* guarantee its reliability, serviceability or functionality.
*
* This sample may include the names of individuals, companies, brands and products
* in order to illustrate concepts as completely as possible. All of these names are
* fictitious and any similarity to the names and addresses used by actual persons or
* business enterprises is entirely coincidental.
*---------------------------------------------------------------------

* Asian_Languages_Troubleshooting_Jan_2011.zip (1277.27 KB - downloaded 501 times.)
« Last Edit: January 19, 2012, 01:38:57 PM by Derek Read » Logged

Su-Laine Yeo
Solutions Consultant
JustSystems Canada, Inc.
Su-Laine Yeo
Solutions Consultant
Member

Posts: 260


« Reply #1 on: January 07, 2011, 06:58:02 PM »

Update: In the latest edition of the attachment (January 2011), a typo in the xep.xml file has been fixed.
Logged

Su-Laine Yeo
Solutions Consultant
JustSystems Canada, Inc.
rde
Member

Posts: 8


« Reply #2 on: December 31, 2012, 07:41:22 PM »

In the "Adding an "xml:lang" attribute to your content" section, it says "5) In the Language field, type either ja-jp for Japanese, or zh-cn for Simplified Chinese."

I would highly recommend replacing "ja-jp" with "ja_JP" and "zh-cn" with "zh_CN". Although ja-jp and zh-cn appear to work fine when creating PDFs in XMetaL, if you try something like creating RTFs with Oxygen, using "ja-jp" and "zh-cn" won't work. You need "ja_JP" and "zh_CN". I know this is an XMetaL forum, but you might as well make your files as compatible with as many tools as possible.
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2484



WWW
« Reply #3 on: January 07, 2013, 01:47:31 PM »

@rde: This raises the question: "should one recommend following specifications, or should one recommend doing what works?"

Underscore is not the recommended format by the W3C (see: http://www.w3.org/TR/xml/#sec-lang-tag). However, if it works in some tools then I suppose you have no choice unless you can implement a change to the code in the tool itself (I'm actually talking about the DITA Open Toolkit here as that's what really matters to you I think -- ie: not oXygen but the fact that your version of oXygen is using a version of the DITA OT that supports, or prefers[?] underscore).

If underscore is working for a particular transtype (output format) then I would consider the version of the DITA OT that you are using to have an issue. Early versions of the DITA OT definitely had this issue. I remember that in some cases only values containing an underscore were supported, later on the DITA OT was altered to support both, presumably because people had been putting "wrong" xml:lang values in and it was nice to continue to support them.

If you find that this is still the case for some transtypes, I would recommend submitting details to the DITA OT project (after checking that the issue has not yet been addressed) https://github.com/dita-ot/dita-ot/issues?state=open
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2484



WWW
« Reply #4 on: January 07, 2013, 02:01:57 PM »

If you are following the link from http://www.w3.org/TR/xml/#sec-lang-tag to http://ftp://ftp.isi.edu/in-notes/bcp/bcp47.txt you will see that it is broken. Ultimately this is the most relevant: http://www.ietf.org/rfc/rfc4647.txt

See the following page as well. It describes how to use xml:lang values to control spell checking in XMetaL (and not just for DITA documents but any document). The subsequent discussion includes some interesting points by Richard Ishida (W3C, with follow-up by me): http://forums.xmetal.com/index.php/topic,539.0.html
Logged
Pages: 1
Print
Jump to:  

email us