DITA and XMetaL Discussion
XMetaL Community Forum › DITA and XMetaL Discussion › Spaces in the URL
-
kwag_myers August 10, 2010 at 8:25 pm
Spaces in the URL
August 10, 2010 at 8:25 pmParticipants 4Replies 5Last Activity 12 years, 5 months agoI'm using Version 6 with a webhelp deliverable. The browser address bar shows the name of my output followed by the topic title with %20 replacing spaces:
file:///C/.../webhelp_out/Help.html#Welcome%20to%20KwagHelp
The project manager says that won't fly – they don't allow spaces.
I'm surprised that the browser doesn't use the file name. Anyway, is there a way to change this? Or, an attribute I can set?
Derek Read August 10, 2010 at 9:07 pm
Reply to: Spaces in the URL
August 10, 2010 at 9:07 pmWebHelp generates HTML files for each topic using the standard DITA OT 'XHTML' transtype, which is the same as you get if you select our deliverable called “Multiple HTML”. In both cases the HTML filename will be the same as the DITA topic filename, so if your topic filenames have spaces in them then those are passed through to the output files.
Easiest solution: Remove the spaces in your topic files or replace them with some other acceptable character (check with your web people).
Hardest solution: Modify the DITA OT so that it renames HTML files that it produces and deletes spaces or replaces spaces with some other character. This may require some JAVA programming.
If you need to quickly rename a bunch of files in a specific folder you may find a “freeware” tool called “CKRename” useful. I'm sure there are other tools available for batch renaming of files. This one lets you remove/change/add characters to filenames for selected files in a specific folder and has other features like autonumbering, case changing, etc.
kwag_myers August 11, 2010 at 6:15 pm
Reply to: Spaces in the URL
August 11, 2010 at 6:15 pmThanks Derek. But I think you lost me. Both XML and HTML file names have underscores in place of spaces. The only place where spaces exist are in the TOC and topic title. Also, some of my file names do not match the title, yet it's the title that displays.
Example: c_Overview.xml is actually titled, “Welcome to Kwag Help” and displays in the address bar as, “.../webhelp_out/KwagHelp.html#Welcome%20to%20Kwag%20Help“.
I did a little noodling around and found that the .js script takes the TOC title for the url. The only place I can find spaces replaced with “%20” is in the webhelp.js (…DITA_OTdemowebhelpcustomizationcommonxmwebhelpscript). I thought I was on to something when I found this (about line 127):
// Check for unsafe characters as per http://www.ietf.org/rfc/rfc1738.txt
// " ", "<", ">", "#", "'", "{", "}", "|", "", "^", "~","[", "]", and "`"
// cannot test for space using "s" as some browsers encode/decode space chars in the address bar
// so replace any spaces with "%20"
innerLink = innerLink.replace(/ /g, "%20");
var unsafeHash = /[<>#'{}|\^~[]`]+/;
var badHash = unsafeHash.test(innerLink);
if (!badHash) {
var frameTitle = decodeURI(innerLink);
// Return first (if any) title that matches from ToC
var matchFromHash = $("#whTocTree a[title = '" + frameTitle + "']:first");
if (matchFromHash.length == 1) {
whTocUpdate(matchFromHash.attr("id"));
matchFromHash.addClass("current");
matchFromHash.click();
var hashHREF = matchFromHash.attr("href");
whContentUpdateFromHash(hashHREF);
}I deleted the “%20” in (/ /g, "%20") to no avail. I tried putting in a couple characters, too.
Derek Read August 11, 2010 at 7:48 pm
Reply to: Spaces in the URL
August 11, 2010 at 7:48 pmAh, now I think I understand.
Yes, this stuff is generated by JavaScript (the anchor portion of the URL that comes after the # sign) is based on document titles, which often have spaces in them. Because the IETF spec for URLs (referenced in the code comments) says you should not use actual spaces in URLs the code escapes them as %20 so that the URLs are legal.
I'm still a little confused because I'm not sure why having the %20 in the URL is a concern. Did your website people give a reason? You say the project manager doesn't allow spaces. So, because there are no spaces in the URL I'm not sure what the issue is. Or are they saying the %20 and space are identical (ie: it is not that the URL is illegal, it is that they think %20 is ugly)?
The thing that might be confusing everyone here is that different browsers display %20 differently in the address bar. In some browsers if a URL contains %20 it is rendered as %20 (IE is one) and some browsers “help” you by rendering the %20 as a space (presumably because this is easier to read and understand). Our code is telling the browser to use %20, it is the browser that is rendering a space, not us (this is true for FireFox). If you have both browsers installed and open the same output in each you should see what I mean.
To show that this is a browser behavior you can visit any website that has %20 in the URL (or set up your own test). You will see different rendering behavior in different browsers. In FireFox (which shows %20 as a space) if you copy such a URL from FireFox and paste it into a text editor you will see the %20 magically appear there, because under the covers FireFox actually knows it is there and puts the %20 on the clipboard, it is just rendering it as a space in the address bar.
If the website people do have an issue with what the JavaScript is doing then I think as part of their code review (assuming they are reviewing the JavaScript before they put it on the website) they could make changes to it so that it does what they want it to do.
The alternative might be to simply generate output using the standard DITA OT transtype 'XHTML' (which we expose in our Generate Output dialog as “Multiple HTML”) and then let the website people integrate those HTML files into the website, creating their own framework around them (if they want frames or not, TOC, searching, etc).
kwag_myers August 12, 2010 at 6:39 pm
Reply to: Spaces in the URL
August 12, 2010 at 6:39 pmI'm still a little confused because I'm not sure why having the %20 in the URL is a concern. Did your website people give a reason? You say the project manager doesn't allow spaces. So, because there are no spaces in the URL I'm not sure what the issue is. Or are they saying the %20 and space are identical (ie: it is not that the URL is illegal, it is that they think %20 is ugly)?
I'm not sure that I understand it either. Today, when I asked about it, “it's not a problem”. Perhaps I misunderstood. So, maybe I'll just put this issue on hold for now and see if it comes up again. If so, I'll have to defer to someone who knows JavaScript.
The thing that might be confusing everyone here is that different browsers display %20 differently in the address bar. In some browsers if a URL contains %20 it is rendered as %20 (IE is one) and some browsers “help” you by rendering the %20 as a space (presumably because this is easier to read and understand). Our code is telling the browser to use %20, it is the browser that is rendering a space, not us (this is true for FireFox). If you have both browsers installed and open the same output in each you should see what I mean.
I do, I did, and I do.
One thing that bothers me about the portion of code I posted before is that it checks for spaces before it takes the title from the TOC. At least, that's the way it appears to me. But I don't know enough about JS to even begin making revisions.
Anyway, if it does come up again as a concern, I have you post to refer to, so thanks for the info.
Derek Read August 12, 2010 at 11:34 pm
Reply to: Spaces in the URL
August 12, 2010 at 11:34 pmSort of good to hear, as I really can't see this being an issue.
Regarding the code, it pulls the title from the TOC file on the fly (as in when the pages and script are actually running in the browser. The TOC itself is generated during output generation and so the content in that file is fixed. It contains spaces if the titles in the topics contain spaces. The titles are displayed with real spaces in the TOC (because that's just HTML) but the code replaces those spaces with %20 to stay inline with RFC1738 for the links that a created and when you click on one of those that's what ends up in the address bar. Hope that makes sense.
-
AuthorPosts
- You must be logged in to reply to this topic.