Pages: 1
Author Topic: spaces in file names  (Read 2486 times)
JustSystems Partner

Posts: 80

« on: January 22, 2015, 07:18:28 PM »

I've been investigating some problems with using OT version 2.0.

I created the following map:

<?xml version="1.0"?>
<!DOCTYPE map PUBLIC "-//OASIS//DTD DITA Map//EN" "map.dtd">
<!-- Created with XMetaL ( -->
<map id="map_499546E9D3CE4C7DA963BA51695EBAD5">
<topicref format="dita" href="first_concept.xml" navtitle="First concept" scope="local"/>
<topicref format="dita" href="second concept.xml" navtitle="Second concept" scope="local"/>

If I try to process it with OT 2.0 I get a URISyntaxException because of the space in the second href attribute. I don't have any problems with OT 1.8.

Regardless of whether this is a regression in the OT, since the value of href is a URI, shouldn't XMetaL set it to "second%20concept.xml" when the topic is inserted?
Derek Read
Program Manager (XMetaL)

Posts: 2570

« Reply #1 on: January 23, 2015, 05:51:58 PM »

The DITA specification states this:

Quote from: The href attribute (DITA Language Reference for DITA 1.2)
The value of a DITA href attribute must be a valid URI reference [RFC 3986]. It is an error if the value is not a valid URI reference. An implementation may (but need not) give an error message, and may (but need not) recover from this error condition by attempting to convert the value to a valid URI reference. Note that the path separator character in a URI is always the forward slash (“/”); the backward slash character (“\”) is not permitted unescaped within URIs.

XMetaL Author Enterprise (up to version 9 which is the current release) does not check to see that values follow that rule, nor does it automatically replace any characters (including spaces) or try to fix up the URI. When working with the local file system the value that is set in dialogs that include @href values are typically coming from a file browsing dialog (exceptions include direct entry into the Attribute Inspector or Plain Text view, or being pasted in), and as such are file paths returned by Windows (which of course allows values that are not URIs). In the case of a CMS integration the value is set by the integration itself and could be any string the CMS specifies. In neither case is any checking or fixing performed.

We might consider doing some of that in a future release but such a feature would need to work with 3rd party sofware (in particular taking into account the fact that several of the CMS systems we integrate with have their own specific requirements for this value, some of which do not follow the DITA specification and likely cannot be made to do so). That would likely complicate things and it means we would likely need to make it a user-configurable setting.

In the version of the DITA Open Toolkit that ships with XMetaL Author Enterprise 9.0 (DITA OT 1.8) if you include a space in an @href value the DITA OT will terminate processing with this error appearing in the log file:

[DOTJ054E][ERROR] Unable to parse invalid href attribute value "this file name has spaces.xml", using invalid value.

Older versions of the DITA OT had trouble processing files (in general) if they contained spaces or punctuation. For this reason we added a "Troubleshooting publishing issues" to the help file that discusses various things to look out for.

I don't want to get too far off topic, but all versions of the DITA OT (going back to the first release) are known to work well if file names are limited to names containing the characters [a-z][A-Z][0-9] only, and a single full stop to separate the file name from the extension. For our Japanese release we had a number of clients (understandably) wanting to use Japanese characters in filenames. The DITA OT cannot handle this at all and so we implemented a special sandboxing feature for that release that copies every file, replaces the filename with one using only alphanumeric characters, alters every link so that files are still found, and then passed this along to the DITA OT. It works well with the one major drawback being that nicely naming your files for HTML output (which many people do) cannot be done for Japanese. The same would be true for any other language that does not use these characters, and in those cases the same feature needs to be enabled. The setting is cmd_fs_sandboxing  =  yes.
« Last Edit: January 23, 2015, 05:53:50 PM by Derek Read » Logged
Pages: 1
Jump to:  

email us