Reply to: Sharepoint file size limitationNovember 30, 2009 at 10:08 am
If you can find the exact link at MSDN that would be very helpful. If this limitation is either due to SOAP or the SharePoint server itself then it would very likely affect the SharePoint connector.
However, I do not think a 50MB limit should be a concern for anyone because very few people (if anyone) would be working with documents of this size. If I am incorrect please provide some examples of large documents you might work with that would approach this size (perhaps they are very data-centric). I believe that on all but the most very robust of computers the real issue in working with documents this size would not be storing them, but finding any XML editor of a similar caliber as XMetaL (one that is designed for creating documents and not manipulating data) that could open and allow you to edit them in an reasonable manner.
In real world usage I don't believe many people would seriously contemplate the creation of a DITA topic that is more than a few thousand KB in size. As part of my troubleshooting duties with the XMetaL Support Team I am regularly sent DITA documents to troubleshoot and I have yet to see (or hear of) a DITA topic over 2MB (and that size is very very rare). That isn't to say that there are none, as there are no restrictions placed on file sizes by the DITA specification, however, the DITA philosophy as I understand it is to keep single topics to one idea that is easily covered within a single printed page or maybe two at the outside. Of course, it is also possible to place many topics within one single DITA file, however, that also goes against what I understand to be the “good” DITA practice of storing each topic in a separate file so that they can be mixed, matched and combined using maps.
Of course, this also does not mean people do not stick all topics within one file and they might also have a good reason to do so that I am not aware of, but I think you would likely have to be producing a large encyclopedia or the entire Oxford English Dictionary to reach 50MB and even then, if working with DITA the philosophy of working with topics in separate files may apply even more so in these cases for additional reasons.
Even if 50MB is the limit for SOAP and/or SharePoint I also have to honestly tell you that I have never seen or heard of an XML document being edited with XMetaL that has been larger than 5MB. There are definitely some very large documents out there (we have some clients that include huge lists of tabular data within their documents) but even these are far far smaller than 50MB.
Consider the following 16 “word” paragraph. It is 100 characters long including the markup. This is a fairly average length for a sentence (seems fairly representative to me anyway, but also a convenient size for the math).
12345 12345 12345 12345 12345 12345 12345 12345 12345 12345 12345 12345 12345 12345 12345 123
Let's assume the worst case and that the “words” above (12345) are all in a language that requires the standard XML encoding of UTF-8 to use two bytes per character (which is not true for English). We need almost 200 bytes to store that string in a file.
Next, for simplicity (and rounding down to be conservative), let's just say that
50 * 1MB =
50 * 1,000,000 Bytes =
Now consider a single DITA topic file consisting only of similar paragraphs, ignoring a few hundred or so bytes for the XML declaration, DOCTYPE declaration and some additional markup (topic, body, attributes, etc). Such a file would need to contain almost 250,000 such 16 word paragraphs in order to reach the 50MB file size (50,000,000 / 200). That's about 4 million words (250,000 * 16).
People seem to like to compare these kinds of things to the King James Bible for some reason (I guess lots of people don't get past all the “begats” at the beginning so it probably seems pretty daunting, and the pages are usually really thin and dense). According to a page at [url=http://wiki.answers.com/Q/How_many_words_in_the_King_James_Bible]WikiAnswers here [/url] that book contains fewer than 824,000 words.
Translated to printed output, and assuming the seemingly often assumed average of about 300 words per page, our 50MB XML file with the 16 word paragraphs would translate to about 13,333 pages (4,000,000 / 300).
Granted, you might stick in more markup (thereby slightly reducing the number of words of actual textual / human-meaningful content), but even then, we're still talking about an amazingly large single DITA topic or even any other document type.