General XMetaL Discussion
edporterIII January 12, 2016 at 3:20 pm
Reply to: Makro Search Entity and Replace itJanuary 12, 2016 at 3:20 pm
It's worth clarifying that I'm not searching for entities. I'm trying to find spaces surrounding certain tags, something like ”
” or “ “. Just using .find in plain-text almost works, except when the space might be separated from the tag by a line break. I wanted to use a regular expression to capture any s.Derek Read January 12, 2016 at 7:17 pm
Reply to: Makro Search Entity and Replace itJanuary 12, 2016 at 7:17 pm
I'm not sure why Selection.Find.Execute() wouldn't work here. I'd need more information to say why.
If it isn't working it could be that there is an issue with the version you are running. That API has had a disproportionate number of defects over the years, flip flopping back and forth between fully working and mostly working, primarily because of all the other APIs and document display functionality that interacts with it.
If you feel that a JScript regular expression would be best I think you would need to go to every element (unless you can limit that somehow), extract the text node portion of the content into a variable, run the regular expression on that, and if a match is found remove that portion of the text from the variable and put the text back into the document. That will be ugly. Alternatively, if the spaces are always at a specific location you might be able to use a Range to get to it (after finding it using JScript RegEx) by positioning the Range at the start or end of the element and counting a specific number of characters right or left. That will also be pretty ugly.
It might be better to concentrate on these things:
1) What problem is this causing? If it is causing an issue for another piece of software perhaps there is a way to adjust its behaviour. I would make this the priority if possible. There are many ways to handle spaces in XSLT for example (if this is a transformation issue).
2) How are the spaces getting into the documents? If it is through the XMetaL pretty printing feature then there are many ways to adjust that on an element by element basis or you can turn it off entirely. If something else is putting them in perhaps that can be adjusted as well.edporterIII January 15, 2016 at 3:01 pm
Reply to: Makro Search Entity and Replace itJanuary 15, 2016 at 3:01 pm
Perhaps Selection.Find.Execute() would work in this instance, but I haven't been able to make it do so. There isn't an example of using this method to search with pattern matching in the documentation. I've tried:
Selection.Find.Execute(“s+Derek Read January 15, 2016 at 7:28 pm
Reply to: Makro Search Entity and Replace itJanuary 15, 2016 at 7:28 pm
The pattern matching is the same as that described in the XMetaL Author help under “pattern matching”.
However, the API will also function differently in Plain Text view vs Normal and Tags On view.
Have you been able to use the Find and Replace dialog to get a document into the state that you need it to be in? If not, then this API will not help as the API actually uses that dialog (and associated logic).
Do you have a requirement for which view this script should run in? Given what you are trying It looks like you want to run it in Plain Text view?
Also, lets figure out if this is something you are designing into a customization that you will be giving to users to run on a regular basis. If that is not the case then I would suggest it might be easier to batch process your existing content (if that is the issue), assuming that once it is fixed up you would not need this any longer.
I probably wouldn't have so many questions if I could see your current customization. If you want to submit a support case through XMetaL Support and include your customization plus a sample XML file that would likely make things go a lot faster. Concentrating on using the wrong tool for the job might be wasting your time. As I said before, it might actually make more sense to avoid having these spaces get into the document in the first place. Or it might make even more sense to not worry about them at all and look into why they are an issue. Would be nice to know all these things so that I can help you resolve this issue the best way.edporterIII January 18, 2016 at 1:56 pm
Reply to: Makro Search Entity and Replace itJanuary 18, 2016 at 1:56 pm
No, you're right the Find dialog will not find all of the spaces I am after. This is in plain-text mode. The issue is spaces like the one indicated in the attached image.
The issue, for us, with spaces is the fact that they end up composing in DL Pager when we compose for print. Thus, when there are centered cells in a table, lines that end with a space will not be perfectly centered.
Ideally, we would search for this in tags-on, but the users are used to a macro flipping to plain-text and return them to previous view after it is complete. This is something we would have a user run once before file submission.
You make a good point, though, regarding trying to account for them outside of XMetaL. I might be able to clean them up in preprocessing before we compose the PDFs. If you don't have any great ideas for catching these in XMetaL, I can pursue that route.Derek Read January 18, 2016 at 8:45 pm
Reply to: Makro Search Entity and Replace itJanuary 18, 2016 at 8:45 pm
So, if you copy the character from the document (from Plain Text view) and paste it into the Find dialog it is not found when you do a find in Plain Text view? I would be surprised if that is the case, but it is possible perhaps for some characters I'm not aware of.
The following Unicode space characters might all appear similar (depending on the font being used to display them) but would all need to be searched for independently:
U+0020 space (what you get when you press the spacebar)
U+00A0 no-break space
U+2002 en space
U+2003 em space
U+2004 three-per-em space
U+2005 four-per-em space
U+2006 six-per-em space
U+2007 figure space
U+2008 punctuation space
U+2009 thin space
U+200A hair space
U+200B zero width space
U+3000 ideographic space
There might be other characters that appear as a space depending on how the font creator chose to draw them, but those are all the Unicode characters I am aware of that appear as a “blank” area under normal circumstances and that someone might use to separate other characters (usually separating words).
In addition to those listed above, the carriage return and line feed characters might appear as a space depending on where they fall in an XML file and how the display is configured (in XMetaL that's done using CSS) and how the XML rules for collapsing white space are applied for a given element in a given document type. These characters can be inserted automatically by the pretty printing feature.edporterIII January 19, 2016 at 1:35 pm
Reply to: Makro Search Entity and Replace itJanuary 19, 2016 at 1:35 pm
No, the issue is specific to the instance I included in the screenshot, namely, the space is separated from the
by a carriage return. Searches using Find.Execute() are not picking up space[carriage return] . There doesn't appear to be a pattern matching variable in XMetaL's implementation like s from regular expressions that will include carriage return/line breaks as a space.Derek Read January 20, 2016 at 1:00 am
Reply to: Makro Search Entity and Replace itJanuary 20, 2016 at 1:00 am
Right. No, you can't find carriage returns using the Find feature. That's a control character you just can't match on, even in Plain Text view. Something that shouldn't be necessary under normal circumstances and so it is unimplemented.
The sequence you indicate (carriage return + space) should be handled by any XML-aware processor without issue. The normal behaviour for multiple white space characters is to treat them as a single space (there are specific circumstances where the white space in an element should be treated as significant, but that has to be specifically indicated to the XML processor). It sounds like whatever you are using has problems dealing with this very common sequence.
If you cannot get that software to deal with the sequence then changing the XML would seem to be the only option. If you already use XSLT to modify or transform documents then I would see if you can take advantage of the XSLT normalize-space() function. That could possibly fix all of this up magically with almost no effort. I would *highly* recommend this over the ugly JScript hack example I list below.
[code=example /hack/ to remove duplicate white spaces]
//XMetaL Script Language JScript:
ActiveDocument.FormattingUpdating = false;
var doctxt = Selection.TextWithRM;
//modify the following regex as needed if it is matching too much
//I don't think you should try to replace the match with nothing (deleting it)
//as I think that definitely has the potential to break markup
doctxt = doctxt.replace(/s+/g,” “);
Application.Alert(“Unable to perform operation on this document.”);
ActiveDocument.FormattingUpdating = true;[/code]
Test that *a lot* on your documents first (and then probably don't use it anyway).
Note that if pretty printing is enabled for this document type then switching to Plain Text view or saving the document will trigger that feature, possibly reintroducing characters you do not want to be in the document. If the unwanted white space characters are getting into the document through pretty printing you'll need to modify the pretty printing settings in the CTM file. There is a global setting and you can control it on a per-element basis as well.
I'm beginning to think that pretty printing is the most likely cause and that you may want to disable it outright. Keep in mind that doing so will not remove carriage returns from existing files, nor will it stop an author from inserting them in Plain Text view, nor stop them from being typed into elements where white space is treated significantly, nor stop someone from pasting them into a document.
- You must be logged in to reply to this topic.