Pages: 1
Print
Author Topic: DITA-OT publishing German characters with HTML symbols  (Read 2759 times)
pmasal
Member

Posts: 86


« on: March 29, 2013, 01:50:46 PM »

When editing German content, XMetaL seems to insert HTML entities for German characters with umlauts and similar, as shown:

<p>Wie lange Sie einen Artikel zur&uuml;ckgeben k&ouml;nnen, h&auml;ngt vom R&uuml;ckgabegrund ab. Wenn Sie einen Artikel aufgrund von Nichtgefallen retournieren m&ouml;chten, haben Sie 30 Tage Zeit. Sollte der Artikel jedoch defekt oder besch&auml;digt sein, k&ouml;nnen Sie diesen innerhalb von zwei Jahren reklamieren.</p>

In search results on our help system, Chrome is having trouble delivering the HTML entities to the results. It is substituting the unicode replacement character &#65533; for the HTML entites as follows:

<div>&#65533;ber R&#65533;ckgabefristen Wie lange Sie einen Artikel zur&#65533;ckgeben k&#65533;nnen, h&#65533;ngt vom R&#65533;ckgabegrund ab...</div>

Is there any way to change XMetaL to allow entry of number entities instead of HTML entities? I realize the issue could be with the search engine/Chrome interaction, but I'd like to explore possibilities in our editing environment as well.
Paul
Logged
Derek Read
Program Manager (XMetaL)
Administrator
Member

Posts: 2621



WWW
« Reply #1 on: March 29, 2013, 02:51:31 PM »

XMetaL actually defaults to using numbered entities in the XML and will use those unless a named entity is defined in your DTD (but various settings can also come in to play). However, it looks like you are authoring DITA, and the only named entity in DITA (which people should actually be avoiding now) is nbsp. That means that if XMetaL is inserting entities into your documents (and normally it should not be) they would be inserted as numbered character references in hex form.

In either case, the encoding for your XML files needs to be something less robust than UTF-8 (the default) or UTF-16 in order for entities to be inserted into the XML (most likely US-ASCII for the characters in question as LATIN1 / ISO-8859-1 supports German characters). If an encoding supports a character (and UTF-8 supports these) then it will save the character as the character and not as a numbered entity (and named entities need to be defined, as previously stated).

I think we'll need to see some test files to see if we can reproduced this. The more information you can provide about the setup the better. The most likely cause for issues here would be where the files are being written to / stored. If a CMS or anything else but a Windows file system is involved I'd start looking there for issues first.

Or perhaps the issue is with the DITA Open Toolkit. I'm not aware of anything that sounds like this but that is possible. In that case the XML files themselves (and XMetaL Author Enterprise) are also likely not the cause -- the XML itself might be fine -- but the DITA OT or modifications to it might being doing this.
« Last Edit: March 29, 2013, 02:55:08 PM by Derek Read » Logged
pmasal
Member

Posts: 86


« Reply #2 on: April 09, 2013, 09:22:55 AM »

Thanks for the great advice as always, Derek. We have isolated this to a potential issue with our internal search software. Thanks again and will keep everyone posted if anything crops up with XMetaL/DITA toolkit.
Logged
Pages: 1
Print
Jump to: