Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 208073 - [web connector] html entities should be decoded
Summary: [web connector] html entities should be decoded
Status: RESOLVED FIXED
Alias: None
Product: z_Archived
Classification: Eclipse Foundation
Component: Mylyn (show other bugs)
Version: dev   Edit
Hardware: PC Linux
: P2 normal (vote)
Target Milestone: 2.3   Edit
Assignee: George Lindholm CLA
QA Contact:
URL:
Whiteboard:
Keywords:
: 204154 (view as bug list)
Depends on:
Blocks:
 
Reported: 2007-10-30 14:19 EDT by George Lindholm CLA
Modified: 2008-01-02 22:13 EST (History)
4 users (show)

See Also:


Attachments
unescapeHtml() for HTML entities (5.22 KB, patch)
2007-10-30 14:27 EDT, George Lindholm CLA
no flags Details | Diff
mylyn/context/zip (1.05 KB, application/octet-stream)
2007-10-30 14:27 EDT, George Lindholm CLA
no flags Details
commons-lang-2.3 (239.53 KB, application/octet-stream)
2007-10-30 14:28 EDT, George Lindholm CLA
no flags Details
JUnit test (5.71 KB, patch)
2007-10-30 16:31 EDT, George Lindholm CLA
no flags Details | Diff
mylyn/context/zip (4.16 KB, application/octet-stream)
2007-10-30 16:32 EDT, George Lindholm CLA
no flags Details
Unified patch (9.75 KB, patch)
2007-10-31 15:02 EDT, George Lindholm CLA
no flags Details | Diff
mylyn/context/zip (10.85 KB, application/octet-stream)
2007-10-31 15:02 EDT, George Lindholm CLA
no flags Details
Detailed error description and examples (1.12 KB, text/plain)
2007-11-06 05:13 EST, Henrik LEion CLA
no flags Details
Modified patch (8.99 KB, patch)
2007-11-06 16:50 EST, George Lindholm CLA
no flags Details | Diff
updated patch to consume commons-lang from orbit (9.21 KB, text/plain)
2007-12-17 19:03 EST, Steffen Pingel CLA
no flags Details
Updated patch to not consume from Orbit (10.78 KB, patch)
2008-01-02 17:23 EST, Steffen Pingel CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description George Lindholm CLA 2007-10-30 14:19:08 EDT
Task descriptions with special characters in them should be decode from the HTML entity name to the
character actually used.

   E.g. A task description like:
   
        "Redirect" error
        
    comes across as:
    
        ""Redirect" error"
Comment 1 George Lindholm CLA 2007-10-30 14:27:39 EDT
Created attachment 81624 [details]
unescapeHtml() for HTML entities

Use the commons-lang StringEscapeUtils.unescapeHtml() method
to decode HTML entities.

I'm not familiar with the PDE environment so I don't think I've attached
the .jar file to the plugin build process properly. I managed to get it
hooked up enough that I can debug the code, but I don't think I've
changed all the right files so that the I can create a fully complete plugin
environment.
Comment 2 George Lindholm CLA 2007-10-30 14:27:42 EDT
Created attachment 81625 [details]
mylyn/context/zip
Comment 3 George Lindholm CLA 2007-10-30 14:28:54 EDT
Created attachment 81626 [details]
commons-lang-2.3

Probably not needed but here for completion
Comment 4 Eugene Kuleshov CLA 2007-10-30 14:57:57 EDT
George, thank you for the patch, but we really need some test coverage for this functionality. 

Can you please add a new test case at /org.eclipse.mylyn.tasks.tests/src/org/eclipse/mylyn/tasks/tests/web and assert that calls to WebRepositoryConnector.performQuery() return expected results. Please include test cases for corner cases, such as double quote and text like "foo & boo", "foo&boo poo". Thanks.
Comment 5 George Lindholm CLA 2007-10-30 15:03:54 EDT
I'll see if I can figure out how to do that. How do I run the test(s)? Like I said, I'm rather green
with the PDE environment :-(
Comment 6 Eugene Kuleshov CLA 2007-10-30 15:34:49 EDT
There isn't really much special about PDE. We have some documentation at http://wiki.eclipse.org/Mylyn_Contributor_Reference#Workspace
Basically, if test have any dependency on workbench classes you will need to run it as "JUnit Plugin test". But for feature you contributing it may work as a regular JUnit test, i.e. if you won't use repository manager to create the repository instance (other parameters of the performQuery() method are easier to construct).
Comment 7 George Lindholm CLA 2007-10-30 16:31:58 EDT
Created attachment 81644 [details]
JUnit test

I think I got this right. The test case runs fine under Eclipse. However the
AllTest case crashes my JVM (1.6.0_03), go figure.
Comment 8 George Lindholm CLA 2007-10-30 16:32:00 EDT
Created attachment 81645 [details]
mylyn/context/zip
Comment 9 Eugene Kuleshov CLA 2007-10-31 00:28:54 EDT
George, this is really great! The only minor refactoring I'd like to ask you to do is to extract assertQuery() method in your test case, so it will be possible to write something like that:

  assertQuery("1:A quote "", "({Id}\\d+?):({Description}.+)", "A quote \"");

I've also tested this with commons-lang-2.1.jar because 2.3 haven't been approved by Eclipse IP process. Can you please cut another patch that will include this refactoring and merge both pieces into a asingle patch.

Mik, I need your approval on this. This contribution requires commons-lang-2.1.jar to be added to org.eclipse.mylyn.web.core project (200k). I am usually not really big fan of commons-* packages, but in this case it probably justify (even just for this single method). Potentially we can also use commons-lang instead of some other code in Mylyn (i.e. XmlStringConverter).
Comment 10 George Lindholm CLA 2007-10-31 01:19:06 EDT
Will do. Should I leave out the references to the jar file from patch since I really have no idea what I'm doing there?
Comment 11 Eugene Kuleshov CLA 2007-10-31 02:09:47 EDT
You are doing everything right. But please use commons-lang-2.1.jar and refer to it in the manifest.
Comment 12 George Lindholm CLA 2007-10-31 15:02:37 EDT
Created attachment 81760 [details]
Unified patch

Switched to commons-lang-2.1 and created a unified patch
Comment 13 George Lindholm CLA 2007-10-31 15:02:40 EDT
Created attachment 81761 [details]
mylyn/context/zip
Comment 14 Mik Kersten CLA 2007-11-01 12:09:16 EDT
Eugene: commons-lang is approved for all projects, but Mylyn will need to submit an IP approval request to use it.  Each project needs to be individually approved to use third party libraries.  This is a much shorter process than a new approval, but still takes some time (hopefully less than 2 weeks, but not clear).  Let me know if you want me to submit the CQ, or whether you want to work around this limitation.  If you would like me to proceed with the CQ please write up a two sentence summary of how Mylyn benefits from this library.
Comment 15 Eugene Kuleshov CLA 2007-11-01 13:17:05 EDT
Mik, the main reason is to get html decoding functionality and unless George want to implement the same functionality from scratch or Rob would suggest any workaround (maybe from bugzilla connector), I don't see how else we could get this fixed. I looked at the implementation in commons-lang and it is not really a small method. Here is the summary for CQ:

Html decoding functionality provided by commons-lang is useful for Mylyn connectors (i.e. non-english user names in buglilla are shown with &xxxx; entries right now).
Comment 16 Eugene Kuleshov CLA 2007-11-01 16:06:14 EDT
*** Bug 204154 has been marked as a duplicate of this bug. ***
Comment 17 Eugene Kuleshov CLA 2007-11-02 14:08:38 EDT
Rob pointed out that there is HtmlStreamTokenizer.unescape() that meant to do the same, however it is failing on the test cases that George collected. See bug 208600
Comment 18 George Lindholm CLA 2007-11-03 03:10:46 EDT
Also, HtmlStreamTokenizer has not been updated in 5 years and recognizes ~114 entity names.

StringEscapeUtils (Entities) was changed 3 months ago and currently recognizes ~250 entity names.
Comment 19 Henrik LEion CLA 2007-11-06 05:13:27 EST
Created attachment 82180 [details]
Detailed error description and examples
Comment 20 Henrik LEion CLA 2007-11-06 05:14:03 EST
Comment on attachment 82180 [details]
Detailed error description and examples

Hi
I'm guessing my problems are related to this bug.

I'm trying to connect to a GForge server using the Mylyn WebConnector and have used the GForge Query URL which is already included in Mylyn.

The problem is that Mylyn seem to strip all '&'-signs from the Query URL. Using & instead causes a transformation to '%26amp%3'.


I've browsed through user docs and bug reports trying to find out if there is something wrong with the query syntax, but on the other hand, the query was given to me by the Mylyn project, I've just set the variables. So if the query is missing an escape sequence for the '&'-signs, the Mylyn template query is malformed.


Please see attachment gforge_problems for more details.

Best regards,
Henrik Leion
Comment 21 Henrik LEion CLA 2007-11-06 05:18:53 EST
Sorry, cut'n'pasted away my version info.

Eclipse: Version: 3.3.0, Build id: I20070621-1340
Mylyn: 2.1.0v20070927-0900
Comment 22 George Lindholm CLA 2007-11-06 16:50:50 EST
Created attachment 82262 [details]
Modified patch

A couple of minor changes since the last version:

   Moved the entity test to after named group test since it used named groups

   Added test for &#Xnn;

   Decode Owner and Type values as well
Comment 23 George Lindholm CLA 2007-11-06 16:56:42 EST
Henrik,
  I don't think this bug is related to your problem since it deals with the receiving end of the query, rather
  than the issuing side.
  
  For what it is worth, I'm using '&' in my Query against a IIS server with no problems.
  
  Are you able to look at the TCP packet stream to see what the query looks like going over the network to the GForge server?
Comment 24 Eugene Kuleshov CLA 2007-11-06 17:50:07 EST
(In reply to comment #20)
> The problem is that Mylyn seem to strip all '&'-signs from the Query URL. Using
> & instead causes a transformation to '%26amp%3'.

If you need "&" in the params of values in the query url, you should use %26. 
Alternatively, you can use a variable substitution and specify variable value in the query properties, then value will be urlencoded for you.

Anyways, I agree with George that your issue is not related to this bug.
Comment 25 Henrik LEion CLA 2007-11-07 07:21:04 EST
Thanks for your help. I searched for the problem elsewhere and found it in my proxy settings.

I was misled at first because the HTTPS URL displayed in the Mylyn alert box is stripped from every '&'. This must simply be a bug in the construction of the error message, not in the construction of the actual HTTP request. 
I'm using ordinary '&'-signs in my query URL (not %26 or anything) now, and it works fine.

Best regards,
Henrik
Comment 26 Eugene Kuleshov CLA 2007-11-17 22:55:26 EST
Mik, Rob, what is the plan for this? BTW, I have feeling that you'll also need this to decode emails for bugzilla. See for example how reporter name look in the task editor for bug 207200
Comment 27 Robert Elves CLA 2007-11-17 23:09:55 EST
Seems like we should start the request process to use the commons lib.
Benefits:
* Home grown library lacks complete coverage of the encoding spec
* Reuse of a maintained, accepted standard lib for this purpose
* ?
Comment 28 Steffen Pingel CLA 2007-11-19 14:29:24 EST
+1 for getting this library (e.g. bug 207567)
Comment 29 George Lindholm CLA 2007-11-19 20:41:44 EST
Steffen, the library won't help with that bug, but it will make it possible to clean
up the mylyn code.

+1 but I'm biased.
Comment 30 Mik Kersten CLA 2007-12-04 14:07:38 EST
Submitted IPZilla CQ 1890 for the commons-lang request.
Comment 31 Mik Kersten CLA 2007-12-04 14:13:47 EST
 (In reply to comment #30)
> Submitted IPZilla CQ 1890 for the commons-lang request.

We have had this approved (thanks to their new streamlined workflow for things previous approved for all projects).

Steffen: please add this library and review George's patch.
Comment 32 George Lindholm CLA 2007-12-15 23:03:26 EST
Any chance that this one will make into 2.2?
Comment 33 Steffen Pingel CLA 2007-12-16 00:34:48 EST
Thanks for the reminder. I'll look into this first thing on Monday morning.
Comment 34 Steffen Pingel CLA 2007-12-17 19:01:36 EST
Unfortunately we won't be able to do this for 2.2. The dependency on commmons-lang needs to be part of a release review which is already scheduled for the 2.2 release. 

Thank you for the patch George! It looks great and we will merge it early in the 2.3 cycle.
Comment 35 Steffen Pingel CLA 2007-12-17 19:03:00 EST
Created attachment 85426 [details]
updated patch to consume commons-lang from orbit
Comment 36 Eugene Kuleshov CLA 2007-12-17 19:36:40 EST
Can we please keep "moving to orbit" theme separate from this issue? I think it doesn't make sense to do that one by one, but it should be done for all dependencies at the same time, because it affects dev workspace, contributors docs and psfs on the wiki, so it does make sense to minimize number of iterations for those.
Comment 37 Mik Kersten CLA 2007-12-17 22:13:40 EST
I would prefer that every new dependency we add be consumed via Orbit because Orbit is the right way of doing things and the more we do with it the better, and because some dependencies (e.g. httpclient) will be non trivial to move to Orbit.
Comment 38 Eugene Kuleshov CLA 2007-12-17 23:44:47 EST
Mik, I am just saying that no one looked trough the whole process yet. For instance, I think that using dependencies from Orbit can cause some weird issues with the update manager. From what I've seen when installing other features and tools, if user install an orbit bundle into some Extension Location, that will cause update issues if the same bundle will be installed into other Extension Location. That will create lots of weird warnings in the logs and maybe also prevent working plugins from starting. I am not sure how to address this kind of stuff (except maybe to create separate features for each orbit bundle), so user could skip them if they already installed.

Also, maybe you know something I don't, but project wiki page for orbit consumers [1] still don't have clear picture how these bundles should be packaged. All in all I still think that it would be better to do the whole nine yards in a separate issue. It doesn't look like trivial change to me, because you'll need to update build, maybe add new features and reorganize update sites between extras, sandbox and the main one, but maybe I am missing something and things are simpler then they seem.

[1] http://wiki.eclipse.org/Orbit_Builds#Orbit_Builds_for_Consumers
Comment 39 Steffen Pingel CLA 2008-01-02 15:21:20 EST
That's a good point. I don't expect that it is going to be a big deal but we should make sure our build process is updated when we make the switch to Orbit. As far as I understand the process we need to redistribute the Orbit bundles from our update site. In favor of making progress I would like to add the commons-lang library the old way until we have done that.
Comment 40 Steffen Pingel CLA 2008-01-02 17:23:32 EST
Created attachment 86032 [details]
Updated patch to not consume from Orbit

I have made a slight modifications to the patch: After unescaping type and owner the results were not assigned to variables.
Comment 41 Mik Kersten CLA 2008-01-02 20:35:27 EST
While it's a good thing that we're not holding this up on the extra build process work that Orbit would take, we do need to keep in mind that cannot continue punting on the Orbit work much longer because it is a Ganymede requirement.  In case we need to provide input or need changes on how we consume bundles, we should do this sooner rather than later.  I'll comment further on bug 214159.
Comment 42 Eugene Kuleshov CLA 2008-01-02 21:53:14 EST
 (In reply to comment #41)
> While it's a good thing that we're not holding this up on the extra build
> process work that Orbit would take...

Mik, is that a "go ahead" for resolving this issue?
Comment 43 Steffen Pingel CLA 2008-01-02 22:07:35 EST
I have committed commons-lang to the web core plug-in and applied the patch.
Comment 44 Steffen Pingel CLA 2008-01-02 22:13:09 EST
Thanks for your contributions George! A build with the changes will be available from the weekly update site later today:

 http://www.eclipse.org/mylyn/downloads/
 
It would be great if you could verify that the problem is fixed and reopen this report if not.