Community
Participate
Working Groups
Build ID: Zend Studio for Eclipse 6.1.0 Steps To Reproduce: Create .xml file with one of the following forms: <?xml version="1.0" encoding="<?php echo MY_CHARSET; ?>"; ?> or <?php echo '<?xml version="1.0" encoding="' . MY_CHARSET . '"; ?>'; ?> In the first case, Eclipse won't save the file reporting: "Character encoding "<?php echo MY_CHARSET; ?>" is not a legal character encoding. In the second case, when the file is already saved, Eclipse reports "Content is not allowed in prolog." In either case, PHP embedded in XML files is allowed. Eclipse doesn't need to know the "coded" character encoding to save my file that only contains the following characters: [a-zA-Z<>/" \n] More information:
FYI - this was marked "Major" because Eclipse won't save XML when it can't determine the encoding format.
Verified in 3.4 M2
I can reproduce the first case but not the second (Content is not allowed in prolog) which looks like some other (not Text) editor was used. The editor is doing the right thing here: if "encoding=" is used then that encoding has to be used to save the file and if that encoding is invalid we must not save the file. Eclipse SDK only knows about XML and not PHP and according to that the behavior is correct. Now, since you use a PHP dev tool, that tool should register its own content type that is aware of PHP when reading the encoding. Please file that bug against Zend Studio for Eclipse.
FYI: as a workaround you can set your desired encoding manually on the file (via Properties). This will disable encoding auto-detection.
Okay - I can accept that. Please consider this as a feature request then - if Eclipse is not able to determine the proper encoding, offer the user a dialog box giving him/her the ability to choose the encoding with the ability to save that preference in the project settings. I will follow-up with Zend by filing a bug in their issue tracker.
(In reply to comment #5) > ... if > Eclipse is not able to determine the proper encoding, offer the user a dialog > box giving him/her the ability to choose the encoding with the ability to save > that preference in the project settings. ... I think some better specification of what's being attempted here would be helpful. I think Eclipse is not able to determine the proper encoding in these cases because the files are not well formed XML. I'd have to check the spec's to know for sure, but even if it was, the encoding is not specified in these files! I suspect you are attempting a "short cut" and specify the encoding of the file, and the encoding of an eventually http output stream at the same time. True, that often works, but not in "dynamic" cases like this where you want to specify the output charset at some future run time. I'd have to look-up or search for examples where I've done this in the past, but suspsect some use of escape characters (for '<') and CDATA sections are the correct way to specify what you want to happen at runtime. HTH
> offer the user a dialog >box giving him/her the ability to choose the encoding Simply hit Alt+Enter and you have the dialog. >I'd have to check the spec's to know >for sure, but even if it was, the encoding is not specified in these files! As far as I know this is considered a fatal error (almost all encoding related issues are defined to be fatal errors by the XML spec). I'm moving this to Platform Resource because their handling of broken encoding attribute is different for IFile.getCharset() and for IContentTypeMatcher.getDescriptionFor(Reader, String, QualifiedName[]). The former defaults to 'UTF-8' in case of error while the second simply returns the encoding attribute value "<?php echo MY_CHARSET; ?>". They should be in sync. Personally, I think defaulting back to 'UTF-8' is not a good idea: the user should be informed that he is going to open a file with corrupt encoding value.
(In reply to comment #7) > I'm moving this to Platform Resource because their handling of broken encoding > attribute is different for IFile.getCharset() and for... So we have two issues here. One is IFile#getCharset, another is the request from comment 5. (In reply to comment #7) > > offer the user a dialog > >box giving him/her the ability to choose the encoding > Simply hit Alt+Enter and you have the dialog. Right, but it doesn't work well for the following steps: 1) have .xml file open 2) paste the invalid content 3) try to save --> error prompt 4) go to properties and change the encoding to UTF-8, Apply and OK 5) try to save --> still error prompt I have to close the editor and then it works.
>I have to close the editor and then it works. No you don't. It does work as long as the editor isn't dirty. Once the editor is dirty we don't pick up the new encoding (known bug 73648). We won't add code to make it easy to override a bug/typo in the encoding attribute value. Now, if you write PHP code and the PHP plug-in doesn't support dynamic encoding attribute values you have to manually set the encoding to whatever encoding you want. And the best time to do this is before even opening that file in an editor.
So, if there are no objections, I'll change the summary to better reflect the real issue described in comment 7 and accept it as a valid bug.
(In reply to comment #9) > >I have to close the editor and then it works. > No you don't. It does work as long as the editor isn't dirty. Once the editor > is dirty we don't pick up the new encoding (known bug 73648). > > We won't add code to make it easy to override a bug/typo in the encoding > attribute value. Now, if you write PHP code and the PHP plug-in doesn't support > dynamic encoding attribute values you have to manually set the encoding to > whatever encoding you want. And the best time to do this is before even opening > that file in an editor. > Glad to hear about bug 73648. OTOH - the last thing I want to see is my editor preventing me from saving my work as my desktop's UPS is dying when I can't hibernate. I'd rather be given the option to override the "warning" and capture my work than to loose it due to Eclipse being finicky about the encoding. Eclipse is able to display the file to me, so it ought to have enough information to at least save the data in binary format. Worst case, warn the user, and give him/her a dialog box of encoding types supported for the file to be saved with a "best guess" based on the content.
(In reply to comment #5) > Okay - I can accept that. Please consider this as a feature request then - if > Eclipse is not able to determine the proper encoding, offer the user a dialog > box giving him/her the ability to choose the encoding with the ability to save > that preference in the project settings. I will follow-up with Zend by filing > a bug in their issue tracker. > FYI all - When I filed a support request with Zend, they told me it was not a bug in Zend, rather a bug in Eclipse. I told them I was not interested in getting into blame wars, rather I encouraged them to work it out with you guys. Just passing it on... KB
>Eclipse is able to display the >file to me, so it ought to have enough information to at least save the data in >binary format. That's the point: it shouldn't allow you to open it in the first place if it already has a header like in comment 0. If this bug here is fixed then you will get an editor with a button that allows you to change the encoding already when trying to open the file.
The xml from comment 0 is not valid. Should such an xml be recognized as of the xml content type or null content type?
I would expect that the encoding is "<?php echo MY_CHARSET; ?>". Note that the bug only appears with files that have a content type and hence a default encoding: for those files it wrongly defaults to that encoding. Test Case: 1. create file b 2. change encoding to something specific 3. in the settings file change it to "foo" 4. open the properties for b ==> the dialog correctly says the encoding is invalid 5. open b ==> the editor tells you the problem and you have a button to change the encoding Now, do repeat the same steps with b.xml or b.properties. Note that not all VM support all encodings and hence defaulting to some encoding can corrupt data.
>Now, do repeat the same steps with b.xml or b.properties. Actually this works too as expected. Only when detection fails it falls back to UTF-8.
(In reply to comment #15) > I would expect that the encoding is "<?php echo MY_CHARSET; ?>". Right, but the content is not a correct xml content, so why we should parse for the encoding? We could consider another option which is that the file is of null content type with the encoding inherited from its parent container. Then if we change its encoding to a proper one, it would be recognized as an xml with some encoding. I think that "not supported encoding" and "invalid xml caused by using <,> in encoding" are two different cases.
>Right, but the content is not a correct xml content There are two variants here: 1) the broken encoding value as seen in comment 0 2) an encoding value that is correct i.e. part of ISO/IEC 10646-1:2000 but is not known to the VM We can decide what to do on 1) but 2) definitely must not default to UTF-8. Having said that we must ensure that IFile.getCharset() IContentTypeMatcher.getDescriptionFor(Reader, String, QualifiedName[]) return the same value.
Created attachment 114658 [details] Proposal v01
Created attachment 114675 [details] Proposal v02
Created attachment 114683 [details] Proposal v02 + tests
Proposal v02 released to HEAD.
Created attachment 114763 [details] Some changes to Proposal v02
Additional changes released to HEAD.
Adding Nitin, as we should be sure to try this out in WTP and see if it changes behavior in any of the special cases we have JUnit's for.
Verified in I20081014-1600: looks good!