Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 249214 - Eclipse reports errors in XML with embedded PHP in prolog.
Summary: Eclipse reports errors in XML with embedded PHP in prolog.
Status: VERIFIED FIXED
Alias: None
Product: Platform
Classification: Eclipse Project
Component: Resources (show other bugs)
Version: 3.4   Edit
Hardware: PC Windows XP
: P3 major (vote)
Target Milestone: 3.5 M3   Edit
Assignee: Szymon Brandys CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks: 250392
  Show dependency tree
 
Reported: 2008-09-30 16:35 EDT by Kevin Benton CLA
Modified: 2008-10-15 04:23 EDT (History)
5 users (show)

See Also:


Attachments
Proposal v01 (2.37 KB, patch)
2008-10-09 06:52 EDT, Szymon Brandys CLA
no flags Details | Diff
Proposal v02 (3.03 KB, patch)
2008-10-09 09:35 EDT, Szymon Brandys CLA
no flags Details | Diff
Proposal v02 + tests (8.51 KB, patch)
2008-10-09 10:18 EDT, Szymon Brandys CLA
no flags Details | Diff
Some changes to Proposal v02 (7.83 KB, patch)
2008-10-10 04:12 EDT, Szymon Brandys CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Benton CLA 2008-09-30 16:35:14 EDT
Build ID: Zend Studio for Eclipse 6.1.0

Steps To Reproduce:
Create .xml file with one of the following forms:

<?xml version="1.0" encoding="<?php echo MY_CHARSET; ?>"; ?>

or

<?php echo '<?xml version="1.0" encoding="' . MY_CHARSET . '"; ?>'; ?>

In the first case, Eclipse won't save the file reporting: "Character encoding "<?php echo MY_CHARSET; ?>" is not a legal character encoding.

In the second case, when the file is already saved, Eclipse reports "Content is not allowed in prolog."

In either case, PHP embedded in XML files is allowed.  Eclipse doesn't need to know the "coded" character encoding to save my file that only contains the following characters: [a-zA-Z<>/" \n]


More information:
Comment 1 Kevin Benton CLA 2008-09-30 16:36:28 EDT
FYI - this was marked "Major" because Eclipse won't save XML when it can't determine the encoding format.
Comment 2 Kevin McGuire CLA 2008-10-02 17:52:47 EDT
Verified in 3.4 M2
Comment 3 Dani Megert CLA 2008-10-03 03:42:00 EDT
I can reproduce the first case but not the second (Content is
not allowed in prolog) which looks like some other (not Text) editor was used.

The editor is doing the right thing here: if "encoding=" is used then that encoding has to be used to save the file and if that encoding is invalid we must not save the file. Eclipse SDK only knows about XML and not PHP and according to that the behavior is correct. Now, since you use a PHP dev tool, that tool should register its own content type that is aware of PHP when reading the encoding.

Please file that bug against Zend Studio for Eclipse.
Comment 4 Dani Megert CLA 2008-10-03 05:23:37 EDT
FYI: as a workaround you can set your desired encoding manually on the file (via Properties). This will disable encoding auto-detection.
Comment 5 Kevin Benton CLA 2008-10-03 11:12:35 EDT
Okay - I can accept that.  Please consider this as a feature request then - if Eclipse is not able to determine the proper encoding, offer the user a dialog box giving him/her the ability to choose the encoding with the ability to save that preference in the project settings.  I will follow-up with Zend by filing a bug in their issue tracker.
Comment 6 David Williams CLA 2008-10-03 11:33:13 EDT
(In reply to comment #5)
> ... if
> Eclipse is not able to determine the proper encoding, offer the user a dialog
> box giving him/her the ability to choose the encoding with the ability to save
> that preference in the project settings.  ...

I think some better specification of what's being attempted here would be helpful. 

I think Eclipse is not able to determine the proper encoding in these cases because the files are not well formed XML. I'd have to check the spec's to know for sure, but even if it was, the encoding is not specified in these files! 

I suspect you are attempting a "short cut" and specify the encoding of the file, and the encoding of an eventually http output stream at the same time. True, that often works, but not in "dynamic" cases like this where you want to specify the output charset at some future run time. 

I'd have to look-up or search for examples where I've done this in the past, but suspsect some use of escape characters (for '<') and CDATA sections are the correct way to specify what you want to happen at runtime. 

HTH





Comment 7 Dani Megert CLA 2008-10-03 11:47:00 EDT
> offer the user a dialog
>box giving him/her the ability to choose the encoding
Simply hit Alt+Enter and you have the dialog.

>I'd have to check the spec's to know
>for sure, but even if it was, the encoding is not specified in these files! 
As far as I know this is considered a fatal error (almost all encoding related issues are defined to be fatal errors by the XML spec).

I'm moving this to Platform Resource because their handling of broken encoding attribute is different for IFile.getCharset() and for IContentTypeMatcher.getDescriptionFor(Reader, String, QualifiedName[]). The former defaults to 'UTF-8' in case of error while the second simply returns the encoding attribute value "<?php echo MY_CHARSET; ?>". They should be in sync. Personally, I think defaulting back to 'UTF-8' is not a good idea: the user should be informed that he is going to open a file with corrupt encoding value.
Comment 8 Szymon Brandys CLA 2008-10-06 08:17:27 EDT
(In reply to comment #7)
> I'm moving this to Platform Resource because their handling of broken encoding
> attribute is different for IFile.getCharset() and for...

So we have two issues here. One is IFile#getCharset, another is the request from comment 5. 

(In reply to comment #7)
> > offer the user a dialog
> >box giving him/her the ability to choose the encoding
> Simply hit Alt+Enter and you have the dialog.

Right, but it doesn't work well for the following steps:
1) have .xml file open
2) paste the invalid content 
3) try to save --> error prompt
4) go to properties and change the encoding to UTF-8, Apply and OK
5) try to save --> still error prompt

I have to close the editor and then it works.
Comment 9 Dani Megert CLA 2008-10-06 09:08:37 EDT
>I have to close the editor and then it works.
No you don't. It does work as long as the editor isn't dirty. Once the editor is dirty we don't pick up the new encoding (known bug 73648).

We won't add code to make it easy to override a bug/typo in the encoding attribute value. Now, if you write PHP code and the PHP plug-in doesn't support dynamic encoding attribute values you have to manually set the encoding to whatever encoding you want. And the best time to do this is before even opening that file in an editor.
Comment 10 Szymon Brandys CLA 2008-10-06 09:37:05 EDT
So, if there are no objections, I'll change the summary to better reflect the real issue described in comment 7 and accept it as a valid bug.
Comment 11 Kevin Benton CLA 2008-10-06 13:31:55 EDT
(In reply to comment #9)
> >I have to close the editor and then it works.
> No you don't. It does work as long as the editor isn't dirty. Once the editor
> is dirty we don't pick up the new encoding (known bug 73648).
> 
> We won't add code to make it easy to override a bug/typo in the encoding
> attribute value. Now, if you write PHP code and the PHP plug-in doesn't support
> dynamic encoding attribute values you have to manually set the encoding to
> whatever encoding you want. And the best time to do this is before even opening
> that file in an editor.
> 

Glad to hear about bug 73648.

OTOH - the last thing I want to see is my editor preventing me from saving my work as my desktop's UPS is dying when I can't hibernate.  I'd rather be given the option to override the "warning" and capture my work than to loose it due to Eclipse being finicky about the encoding.  Eclipse is able to display the file to me, so it ought to have enough information to at least save the data in binary format.  Worst case, warn the user, and give him/her a dialog box of encoding types supported for the file to be saved with a "best guess" based on the content.
Comment 12 Kevin Benton CLA 2008-10-06 17:54:37 EDT
(In reply to comment #5)
> Okay - I can accept that.  Please consider this as a feature request then - if
> Eclipse is not able to determine the proper encoding, offer the user a dialog
> box giving him/her the ability to choose the encoding with the ability to save
> that preference in the project settings.  I will follow-up with Zend by filing
> a bug in their issue tracker.
> 

FYI all - When I filed a support request with Zend, they told me it was not a bug in Zend, rather a bug in Eclipse.  I told them I was not interested in getting into blame wars, rather I encouraged them to work it out with you guys.  Just passing it on...

KB
Comment 13 Dani Megert CLA 2008-10-07 03:18:56 EDT
>Eclipse is able to display the
>file to me, so it ought to have enough information to at least save the data in
>binary format.
That's the point: it shouldn't allow you to open it in the first place if it already has a header like in comment 0. If this bug here is fixed then you will get an editor with a button that allows you to change the encoding already when trying to open the file.
Comment 14 Szymon Brandys CLA 2008-10-07 08:31:14 EDT
The xml from comment 0 is not valid. Should such an xml be recognized as of the xml content type or null content type?
Comment 15 Dani Megert CLA 2008-10-07 08:44:16 EDT
I would expect that the encoding is "<?php echo MY_CHARSET; ?>".

Note that the bug only appears with files that have a content type and hence a default encoding: for those files it wrongly defaults to that encoding.

Test Case:
1. create file b
2. change encoding to something specific
3. in the settings file change it to "foo"
4. open the properties for b
   ==> the dialog correctly says the encoding is invalid
5. open b
   ==> the editor tells you the problem and you have a button to change the
       encoding

Now, do repeat the same steps with b.xml or b.properties.


Note that not all VM support all encodings and hence defaulting to some encoding can corrupt data.
Comment 16 Dani Megert CLA 2008-10-07 08:57:33 EDT
>Now, do repeat the same steps with b.xml or b.properties.
Actually this works too as expected. Only when detection fails it falls back to UTF-8.
Comment 17 Szymon Brandys CLA 2008-10-07 09:28:24 EDT
(In reply to comment #15)
> I would expect that the encoding is "<?php echo MY_CHARSET; ?>".

Right, but the content is not a correct xml content, so why we should parse for the encoding? We could consider another option which is that the file is of null content type with the encoding inherited from its parent container.

Then if we change its encoding to a proper one, it would be recognized as an xml with some encoding.

I think that "not supported encoding" and "invalid xml caused by using <,> in encoding" are two different cases.
Comment 18 Dani Megert CLA 2008-10-07 11:15:18 EDT
>Right, but the content is not a correct xml content
There are two variants here:
1) the broken encoding value as seen in comment 0
2) an encoding value that is correct i.e. part of ISO/IEC 10646-1:2000 but is not
   known to the VM
We can decide what to do on 1) but 2) definitely must not default to UTF-8.

Having said that we must ensure that IFile.getCharset() IContentTypeMatcher.getDescriptionFor(Reader, String, QualifiedName[]) return the same value.
Comment 19 Szymon Brandys CLA 2008-10-09 06:52:22 EDT
Created attachment 114658 [details]
Proposal v01
Comment 20 Szymon Brandys CLA 2008-10-09 09:35:27 EDT
Created attachment 114675 [details]
Proposal v02
Comment 21 Szymon Brandys CLA 2008-10-09 10:18:41 EDT
Created attachment 114683 [details]
Proposal v02 + tests
Comment 22 Szymon Brandys CLA 2008-10-09 10:48:27 EDT
Proposal v02 released to HEAD.
Comment 23 Szymon Brandys CLA 2008-10-10 04:12:06 EDT
Created attachment 114763 [details]
Some changes to Proposal v02
Comment 24 Szymon Brandys CLA 2008-10-10 04:15:48 EDT
Additional changes released to HEAD.
Comment 25 David Williams CLA 2008-10-10 04:25:25 EDT
Adding Nitin, as we should be sure to try this out in WTP and see if it changes behavior in any of the special cases we have JUnit's for. 



Comment 26 Dani Megert CLA 2008-10-15 04:23:18 EDT
Verified in I20081014-1600: looks good!