DITA and XMetaL Discussion

  • kwag_myers

    Spaces in the URL

    Participants 4
    Replies 5
    Last Activity 11 years, 10 months ago

    I'm using Version 6 with a webhelp deliverable. The browser address bar shows the name of my output followed by the topic title with %20 replacing spaces:

    file:///C/.../webhelp_out/Help.html#Welcome%20to%20KwagHelp

    The project manager says that won't fly – they don't allow spaces.

    I'm surprised that the browser doesn't use the file name. Anyway, is there a way to change this? Or, an attribute I can set?

    Reply

    Derek Read

    Reply to: Spaces in the URL

    WebHelp generates HTML files for each topic using the standard DITA OT 'XHTML' transtype, which is the same as you get if you select our deliverable called “Multiple HTML”. In both cases the HTML filename will be the same as the DITA topic filename, so if your topic filenames have spaces in them then those are passed through to the output files.

    Easiest solution: Remove the spaces in your topic files or replace them with some other acceptable character (check with your web people).

    Hardest solution: Modify the DITA OT so that it renames HTML files that it produces and deletes spaces or replaces spaces with some other character. This may require some JAVA programming.

    If you need to quickly rename a bunch of files in a specific folder you may find a “freeware” tool called “CKRename” useful. I'm sure there are other tools available for batch renaming of files. This one lets you remove/change/add characters to filenames for selected files in a specific folder and has other features like autonumbering, case changing, etc.

    Reply

    kwag_myers

    Reply to: Spaces in the URL

    Thanks Derek. But I think you lost me. Both XML and HTML file names have underscores in place of spaces. The only place where spaces exist are in the TOC and topic title. Also, some of my file names do not match the title, yet it's the title that displays.

    Example: c_Overview.xml is actually titled, “Welcome to Kwag Help” and displays in the address bar as, “.../webhelp_out/KwagHelp.html#Welcome%20to%20Kwag%20Help“.

    I did a little noodling around and found that the .js script takes the TOC title for the url. The only place I can find spaces replaced with “%20” is in the webhelp.js (…DITA_OTdemowebhelpcustomizationcommonxmwebhelpscript). I thought I was on to something when I found this (about line 127):

                // Check for unsafe characters as per http://www.ietf.org/rfc/rfc1738.txt
                // " ", "<", ">", "#", "'", "{", "}", "|", "", "^", "~","[", "]", and "`"
                // cannot test for space using "s" as some browsers encode/decode space chars in the address bar
                // so replace any spaces with "%20"
                innerLink = innerLink.replace(/ /g, "%20");
                var unsafeHash = /[<>#'{}|\^~[]`]+/;
                var badHash = unsafeHash.test(innerLink);
                if (!badHash) {
                    var frameTitle = decodeURI(innerLink);
                    // Return first (if any) title that matches from ToC
                    var matchFromHash = $("#whTocTree a[title = '" + frameTitle + "']:first");
                    if (matchFromHash.length == 1) {
                        whTocUpdate(matchFromHash.attr("id"));
                        matchFromHash.addClass("current");
                        matchFromHash.click();
                        var hashHREF = matchFromHash.attr("href");
                        whContentUpdateFromHash(hashHREF);
                    }

    I deleted the “%20” in (/ /g, "%20") to no avail. I tried putting in a couple characters, too.

    Reply

    Derek Read

    Reply to: Spaces in the URL

    Ah, now I think I understand.

    Yes, this stuff is generated by JavaScript (the anchor portion of the URL that comes after the # sign) is based on document titles, which often have spaces in them. Because the IETF spec for URLs (referenced in the code comments) says you should not use actual spaces in URLs the code escapes them as %20 so that the URLs are legal.

    I'm still a little confused because I'm not sure why having the %20 in the URL is a concern. Did your website people give a reason? You say the project manager doesn't allow spaces. So, because there are no spaces in the URL I'm not sure what the issue is. Or are they saying the %20 and space are identical (ie: it is not that the URL is illegal, it is that they think %20 is ugly)?

    The thing that might be confusing everyone here is that different browsers display %20 differently in the address bar. In some browsers if a URL contains %20 it is rendered as %20 (IE is one) and some browsers “help” you by rendering the %20 as a space (presumably because this is easier to read and understand). Our code is telling the browser to use %20, it is the browser that is rendering a space, not us (this is true for FireFox). If you have both browsers installed and open the same output in each you should see what I mean.

    To show that this is a browser behavior you can visit any website that has %20 in the URL (or set up your own test). You will see different rendering behavior in different browsers. In FireFox (which shows %20 as a space) if you copy such a URL from FireFox and paste it into a text editor you will see the %20 magically appear there, because under the covers FireFox actually knows it is there and puts the %20 on the clipboard, it is just rendering it as a space in the address bar.

    If the website people do have an issue with what the JavaScript is doing then I think as part of their code review (assuming they are reviewing the JavaScript before they put it on the website) they could make changes to it so that it does what they want it to do.

    The alternative might be to simply generate output using the standard DITA OT transtype 'XHTML' (which we expose in our Generate Output dialog as “Multiple HTML”) and then let the website people integrate those HTML files into the website, creating their own framework around them (if they want frames or not, TOC, searching, etc).

    Reply

    kwag_myers

    Reply to: Spaces in the URL

    I'm still a little confused because I'm not sure why having the %20 in the URL is a concern. Did your website people give a reason? You say the project manager doesn't allow spaces. So, because there are no spaces in the URL I'm not sure what the issue is. Or are they saying the %20 and space are identical (ie: it is not that the URL is illegal, it is that they think %20 is ugly)?

    I'm not sure that I understand it either. Today, when I asked about it, “it's not a problem”. Perhaps I misunderstood. So, maybe I'll just put this issue on hold for now and see if it comes up again. If so, I'll have to defer to someone who knows JavaScript.

    The thing that might be confusing everyone here is that different browsers display %20 differently in the address bar. In some browsers if a URL contains %20 it is rendered as %20 (IE is one) and some browsers “help” you by rendering the %20 as a space (presumably because this is easier to read and understand). Our code is telling the browser to use %20, it is the browser that is rendering a space, not us (this is true for FireFox). If you have both browsers installed and open the same output in each you should see what I mean.

    I do, I did, and I do.

    One thing that bothers me about the portion of code I posted before is that it checks for spaces before it takes the title from the TOC. At least, that's the way it appears to me. But I don't know enough about JS to even begin making revisions.

    Anyway, if it does come up again as a concern, I have you post to refer to, so thanks for the info.

    Reply

    Derek Read

    Reply to: Spaces in the URL

    Sort of good to hear, as I really can't see this being an issue.

    Regarding the code, it pulls the title from the TOC file on the fly (as in when the pages and script are actually running in the browser. The TOC itself is generated during output generation and so the content in that file is fixed. It contains spaces if the titles in the topics contain spaces. The titles are displayed with real spaces in the TOC (because that's just HTML) but the code replaces those spaces with %20 to stay inline with RFC1738 for the links that a created and when you click on one of those that's what ends up in the address bar. Hope that makes sense.

    Reply

  • You must be logged in to reply to this topic.

Lost Your Password?

Products
Downloads
Support