General XMetaL Discussion

XMetaL Community Forum General XMetaL Discussion xml property of ActiveDocument object and UTF-8

  • LeeHart

    xml property of ActiveDocument object and UTF-8

    Participants 7
    Replies 8
    Last Activity 12 years, 8 months ago

    Using XMetaL 5.5 Essential SP1 and XMetaL 6.0 Essential on most supported versions of Windows

    The default document preview garbles the DOCTYPE line (replacing a reference to the DTD with a subset of the DTD in the file) so our documents no longer render so we have a On_Before_Document_Preview macro that writes ActiveDocument.xml to the BrowserURL file. (This is not the problem, we've been doing this since at least XMetaL 2.1) The problem is that any UTF-8 sequences in the file are not properly handled in the xml or xmlWithCT properties.

    For example, the “Smart Quotes” automatically inserted by Office (“”), U+201C and U+201D are E2 80 9C and E2 80 9D when encoded as UTF-8. However in the xml or xmlWithCT properties they are represented as 93 and 94 (??). As a result when the file is previewed it fails.

    How can I get a copy of the current XML in UTF-8 format (or whatever encoding format is specified in the xml)? I could force the preview to save the file and then copy the saved file to the BrowserURL but that seems to be a hack; I'd rather work with the UTF-8.

    Thanks,

    Lee

    Reply

    dcramer

    Reply to: xml property of ActiveDocument object and UTF-8

    I'd be interested in seeing your On_Before_Document_Preview. We've never used preview because of its DOCTYPE-related problems. We never use Office “Smart Quotes” in our XML so that won't be a problem.

    Thanks,
    David

    Reply

    Derek Read

    Reply to: xml property of ActiveDocument object and UTF-8

    I'm not sure there is an “encoding” per se when you use Document.xml, the script engine and Windows handles that as far as I know. I suspect a particular script engine will do some automatic conversion based on XMetaL treating everything as UTF-16 in memory and when passing values around. I believe this is true because a few years ago we had reports for a specific version of PerlScript (ActiveState) not handling some characters.

    Is it really just these two characters that is the issue?

    If you have something in a string and save it out using script I think it is up to whatever is doing the saving to specify an encoding. Are you using FSO for this? If that's the case then FSO only writes out ASCII or UTF-16 (and you have to tell it ). What are you using to write the XML? Although it is unlikely, perhaps your script engine version may have an effect here as well? Did you recently (perhaps unknowingly if this is JScript or VBScript) upgrade the version of the script engine?

    Reply

    LeeHart

    Reply to: xml property of ActiveDocument object and UTF-8

    We're just using FSO. Here's the macro:

    g_Preview = 0 Sub runCode() g_Preview = 1 Dim strFilename
    Dim fso strFilename = ActiveDocument.Path & Chr(92) & Mid(ActiveDocument.BrowserURL, InStrRev(ActiveDocument.BrowserURL, “/”) + 1)
    Set fso = CreateObject(“Scripting.FileSystemObject”) ' Force save, copy saved file to temporary BrowserURL file
    ActiveDocument.Save
    fso.CopyFile ActiveDocument.FullName, strFilename, true Set strFilename = Nothing
    Set fso = Nothing End Sub
    ]]>

    Previously the CopyFile line was similar to this:
    Set file = fso.OpenTextFile(strFilename, 2, -2) ' for writing, tristate-default

    ' replace contents of temporary file with the current document
    file.Write ActiveDocument.xml

    (I'm not including the declaration or cleanup of the file variable.)

    Thanks,

    Lee

    Reply

    dcramer

    Reply to: xml property of ActiveDocument object and UTF-8

    Thanks Lee, I'll give that a try.

    Reply

    Derek Read

    Reply to: xml property of ActiveDocument object and UTF-8

    I'll try your script. My testing shows that JScript thinks the values are (in decimal) 8220 and 8221, which is correct. It also specifically does not find the values 93 or 94 (not sure if those were hex or decimal so I'm checking both). [Sorry, I wrote it in JScript before I saw your VBScript]

    Here's my test script:

    [code]//XMetaL Script Language JScript:
    //run on any document containing “smart quotes”
    //best to run on a document with only two of them
    var x = ActiveDocument.xml;
    Application.Alert(x);
    for(i=0;i var c = x.charCodeAt(i);
    if (c == 93) {
    Application.Alert(“Found a U+005D (dec 93) at offset:” + i);
    }
    else if (c == 94) {
    Application.Alert(“Found a U+005E (dec 94) at offset:” + i);
    }
    else if (c == 147) {
    Application.Alert(“Found a U+0094 (dec 147) at offset:” + i);
    }
    else if (c == 148) {
    Application.Alert(“Found a U+0094 (dec 148) at offset:” + i);
    }
    else if (c == 8220) { //left double quotation mark
    Application.Alert(“found U+201C (dec 8220) at offset:” + i);
    }
    else if (c == 8221) { //left double quotation mark
    Application.Alert(“found U+201D (dec 8221) at offset:” + i);
    }
    }[/code]

    Reply

    LeeHart

    Reply to: xml property of ActiveDocument object and UTF-8

    Change the ActiveDocument.Save line to this:

    Set rng = ActiveDocument.Range
    rng.SelectAll

    If (rng.ReadOnlyContainer = false) Then ActiveDocument.Save

    (Again the declaration and cleanup of the rng variable is not included.)

    This avoids the Save As dialog if the file is readonly.

    Lee

    Reply

    LeeHart

    Reply to: xml property of ActiveDocument object and UTF-8

    Derek, I was looking at the file that was written out, not the state in memory so the xml and xmlWithCT properties are probably fine. The values were hex. I didn't know about the UTF-8 limitation of FSO.

    I'll look around for alternatives.

    Thanks,

    Lee

    Reply

    Derek Read

    Reply to: xml property of ActiveDocument object and UTF-8

    Yeah, so FSO is definitely the problem here.

    One way around this issue is to use ADODB.Stream, which does support UTF-8 but what you need to get that working is pretty convoluted in my opinion. If you Google for “FSO createfile UTF-8” many of the hits will point you to ADODB.Stream or other non-FSO solutions.

    We have some undocumented code that you might try that may be easier to use than ADODB.Stream. However, because it is undocumented you'd need to figure out how we use it by looking at our scripts. It would essentially go something like this:

    var cwUtil = new ActiveXObject("CWUtil.StringUtil");
          if (!cwUtil) return false;
          cwUtil.StringToUTF8File(xmlStr, tempFileName);

    The cwUtil is a DLL that we install and register so it is available to all XMetaL Author installations (both Essential and Enterprise). However, a JS file that demonstrates it is only included as part of the DITA solution that comes with Enterprise here: DITAXACssharedditajsdita_utils.js

    Here is a similar example that might actually be easier to understand. Make particular note of the (somewhat ironic) warnings in the comments, and sorry — JScript again as most of our internals are standardized on that script language (this is written as a JScript prototype as the DLL is somewhat limited in functionality w.r.t. file cleanup, etc):

    [code]XMAULocalFileSystemService.prototype.SaveXMLToUTF8File = function(xmlStr, filePath)
    {
          // PROD00032235: XMXML would help out here since Xerces wants to
          // resolveExternals, MSXML doesn't like the DITA DTDs, FSO's unicode-mode
          // write UTF16 plus a BOM (which Documentum doesn't like)….so, we do
          // the following:
          //
          //  1. Create temp filename that uses only ascii chars
          //  2. Use CWUtil which can write temp using UTF8 but StringToUTF8File()
          //      only works if file name is ascii
          //  3. Use FSO to copy to temp file to filename with unicode chars
          //  4. Delete temp file
         
          if (!xmlStr || !filePath) return false;
          var folderName = this.GetParentFolderName(filePath);
          var tempFileName = Application.UniqueFileName(folderName, “__”);
         
          var cwUtil = new ActiveXObject(“CWUtil.StringUtil”);
          if (!cwUtil) return false;
          cwUtil.StringToUTF8File(xmlStr, tempFileName);
         
          this.CopyFile(tempFileName, filePath, true);
          this.DeleteFile(tempFileName, true);
          return true;                 
    }[/code]

    Reply

  • You must be logged in to reply to this topic.

Lost Your Password?

Products
Downloads
Support