DITA and XMetaL Discussion

XMetaL Community Forum DITA and XMetaL Discussion DITA-OT publishing German characters with HTML symbols

  • pmasal

    DITA-OT publishing German characters with HTML symbols

    Participants 1
    Replies 2
    Last Activity 9 years, 10 months ago

    When editing German content, XMetaL seems to insert HTML entities for German characters with umlauts and similar, as shown:

    Wie lange Sie einen Artikel zurückgeben können, hängt vom Rückgabegrund ab. Wenn Sie einen Artikel aufgrund von Nichtgefallen retournieren möchten, haben Sie 30 Tage Zeit. Sollte der Artikel jedoch defekt oder beschädigt sein, können Sie diesen innerhalb von zwei Jahren reklamieren.

    In search results on our help system, Chrome is having trouble delivering the HTML entities to the results. It is substituting the unicode replacement character � for the HTML entites as follows:

    �ber R�ckgabefristen Wie lange Sie einen Artikel zur�ckgeben k�nnen, h�ngt vom R�ckgabegrund ab…

    Is there any way to change XMetaL to allow entry of number entities instead of HTML entities? I realize the issue could be with the search engine/Chrome interaction, but I'd like to explore possibilities in our editing environment as well.


    Derek Read

    Reply to: DITA-OT publishing German characters with HTML symbols

    XMetaL actually defaults to using numbered entities in the XML and will use those unless a named entity is defined in your DTD (but various settings can also come in to play). However, it looks like you are authoring DITA, and the only named entity in DITA (which people should actually be avoiding now) is nbsp. That means that if XMetaL is inserting entities into your documents (and normally it should not be) they would be inserted as numbered character references in hex form.

    In either case, the encoding for your XML files needs to be something less robust than UTF-8 (the default) or UTF-16 in order for entities to be inserted into the XML (most likely US-ASCII for the characters in question as LATIN1 / ISO-8859-1 supports German characters). If an encoding supports a character (and UTF-8 supports these) then it will save the character as the character and not as a numbered entity (and named entities need to be defined, as previously stated).

    I think we'll need to see some test files to see if we can reproduced this. The more information you can provide about the setup the better. The most likely cause for issues here would be where the files are being written to / stored. If a CMS or anything else but a Windows file system is involved I'd start looking there for issues first.

    Or perhaps the issue is with the DITA Open Toolkit. I'm not aware of anything that sounds like this but that is possible. In that case the XML files themselves (and XMetaL Author Enterprise) are also likely not the cause — the XML itself might be fine — but the DITA OT or modifications to it might being doing this.



    Reply to: DITA-OT publishing German characters with HTML symbols

    Thanks for the great advice as always, Derek. We have isolated this to a potential issue with our internal search software. Thanks again and will keep everyone posted if anything crops up with XMetaL/DITA toolkit.


  • You must be logged in to reply to this topic.

Lost Your Password?