Community
Participate
Working Groups
+++ This bug was initially created as a clone of Bug #346362 +++ We get reports of 500 errors on help.eclipse.org every so often. In all cases it looks like an OOME event has hosed the helpcentre. Based on suggestions from other bugs we've cranked up the memory limits on the help center but it still falls over every so often. Here is the startup command: /usr/bin/java -Xmx1024m -XX:MaxPermSize=256m -XX:+HeapDumpOnOutOfMemoryError -classpath /home/data/httpd/help.eclipse.org/indigo/eclipse/plugins/org.eclipse.help.base_3.6.0.v201106030909.jar org.eclipse.help.standalone.Infocenter -clean -command start -eclipsehome /home/data/httpd/help.eclipse.org/indigo/eclipse -nl en -locales en -plugincustomization /home/data/httpd/help.eclipse.org/indigo/eclipse/plugin_customization.ini You can grab a compressed copy of the heapdump the last OOME event generated here: http://build.eclipse.org/heapdump.20110809.100805.31640.0011.phd.gz -M.
I would just connect something like the yourkit profiler to the process and watch it for a bit, that ought to show you the culprit. i would take a closer look but from what I can tell I would have to install special ibm heap dump analysis software
Clearly this needs to be fixed. Targeting 3.8 but if possible I would like to get this into a service release.
Created attachment 201617 [details] Picture showing over one million Jetty sessions I opened the heap dump with Memory Analyzer (MAT). As this is an IBM dump, one needs to install first a feature from IBM's site in order to parse the dumps: http://wiki.eclipse.org/index.php/MemoryAnalyzer#System_Dumps_and_Heap_Dumps_from_IBM_Virtual_Machines What I found inside is a huge number (over a million) of Jetty sessions (see attachment). We'll have to figure out why they are there - no timeout set, some leak, etc... Unfortunately the .phd dump does not contain field values, otherwise we could look at some of the attributes of the Session objects and see if the sessions were already invalidated, when they are created, etc... From IBM VMs one can get also a system dump (core file) and convert it to a format MAT can understand. System dumps contain more information - also field values. It could be helpful if you can get such a dump. See the link above for some more information. I hope this helps as some first analysis.
Created attachment 201618 [details] Jetty Session Manager All the paths to the Session objects look rather expected - all go through the SessionManager and through an java.util.HashMap$ValueIterator on the Map that contains them. I don't see any ojbect outside Jetty itself which is holding the sessions.
So the help center hosts the helps of various Eclipse versions. Given the default session duration of 30 minutes is it reasonable to say that the help center gets such a huge amount of visits per day? Do we have access logs for the help center? It could also be that bots are accessing the help center which do not handle cookies. Thus, every request from such a bot would be a new session. On the other hand, does the help center actively store data in the session? I thought it was stateless.
I am with gunnar on that, seems like something that ought to be stateless in which case it could just be a jetty misconfiguration which this wouldn't be the first one we have seen. that being said there have been a few issues fixed regarding session expiration over the last few years so if sessions are really in play then it would be good to know what jetty version your working with
> So the help center hosts the helps of various Eclipse versions. Given the > default session duration of 30 minutes is it reasonable to say that the help > center gets such a huge amount of visits per day? Do we have access logs for > the help center? help.eclipse.org serves between 1.2 million and 2.1 million hits/day. Google and other bots account for some, but most come from individuals. If you're logged into the Portal, you can access the webalizer stats here: https://dev.eclipse.org/committers/webstats/help.eclipse.org/usage_201108.php We use Apache and a ProxyPass to incorporate each version of the help into help.eclipse.org. RewriteRule ^/indigo/$ http://help.eclipse.org/indigo/index.jsp [R,L] ProxyPass /indigo http://localhost:8085/help ProxyPassReverse /indigo http://localhost:8085/help
I assume that the Jetty version is the one shipped in Equinox 3.7 which would be Jetty 6.1.23. Jetty is configured melodramatically this way: http://git.eclipse.org/c/equinox/rt.equinox.bundles.git/tree/bundles/org.eclipse.equinox.http.jetty6/src/org/eclipse/equinox/http/jetty/internal/HttpServerManager.java Please have a look at #createHttpContext for the session configuration.
(In reply to comment #8) > Jetty is configured melodramatically this way: Ok, I don't know what my dictionary did. But I swear I typed 'programatically' (with a missing 'm').
(In reply to comment #9) > (In reply to comment #8) > > Jetty is configured melodramatically this way: > > Ok, I don't know what my dictionary did. But I swear I typed 'programatically' > (with a missing 'm'). It was far more entertaining the way you wrote it the first time. I had this image of someone madly hacking on config scripts with Ride of the Valkyries blaring in the background ;)
I was just impressed by the novel choice of words :) So...is there even a need for a session manager for this usage? If not I recommend just not using one. I looked over the issues resolved with session manager since 6.1.23 and I really only see one that showed up with session expiration where there was an IllegalStateException that was keeping sessions from being removed. http://jira.codehaus.org/browse/JETTY-1214 I would recommend updating your jetty to something a bit more current, 6.1.26 would be good, we are likely releasing a 6.1.27 in November as a maintenance release since it will be a year. Ideally I would like to see you all updated to 7.x releases here at eclipse, 7.4.5 is stable and 7.5.0 is coming out the first of Sept. cheers, jesse
> I would recommend updating your jetty to something a bit more current But at first glance, is there anything we can reconfigure and/or disable to make current installs more stable?
Do you need sessions? If not, then disable them.. past that, lemme dig out the source from waaaay back then and take a look :) jesse
Also, I can't piece out if you are actually setting a session scavenge period or not in the code, its lots of indirection to properties... default is 30000ms if your not. You could also call setMaxInactiveInterval to something small if your not already. If this is triggering the IllegalStateException issue though I don't know that this will fix it or not. Still wondering if you need sessions in this case or not. I checks and I don't see a purge option on the jmx session interface so I don't think there is a way to force it from outside the code.
The Help system does not store any state related to a session. The only data structures created are shared across all sessions. Is turning off sessions as simple as passing an extra parameter to the JettyConfigurator? It seems worth a try.
Denis, you might try starting the service with -Dorg.eclipse.equinox.http.jetty.context.sessioninactiveinterval=60 (after the -vmargs option) This should set the session inactive interval to 60 seconds. However, I'm afraid that you'll have to raise the memory again (for now). What other start options are you setting?
(In reply to comment #15) > The Help system does not store any state related to a session. The only data > structures created are shared across all sessions. Is turning off sessions as > simple as passing an extra parameter to the JettyConfigurator? It seems worth a > try. I think it's worth opening an enhancement request to add such a parameter to JettyConfigurator.
We have Indigo SR1 coming up quite soon. So, if there is a code change that can be made in Platform UA to disable sessions that sounds promising. We could then upgrade help.eclipse.org to use SR1. In the meantime, the session timeout property sounds like a potential workaround. It still doesn't add up to me that we have > 1 million session instances, even though there is an average of ~20,000 unique visitors a day according to the server logs. With a 30 minute timeout we shouldn't be seeing such an accumulation of sessions, unless a significant number of visitors have cookies disabled. Since the Jetty version we are using is so old, it might not be worth a great deal of investigation though.
John, if I read the code correctly, the default session timeout is '-1'. This means "no timeout" which would be a valid explanation for 1 million sessions after a certain lifetime. AbstractSessionManager, around line 74: /* ------------------------------------------------------------ */ // Setting of max inactive interval for new sessions // -1 means no timeout protected int _dftMaxIdleSecs=-1;
> Jetty is configured melodramatically this way: Quote of the year right there. My first thought was that you should book another trip to Toronto to refresh on your English vocabulary. > What other start options are you setting? /usr/bin/java -Xmx1024m -XX:MaxPermSize=256m -XX:+HeapDumpOnOutOfMemoryError -Dorg.eclipse.equinox.http.jetty.context.sessioninactiveinterval=60 -classpath /home/data/httpd/help.eclipse.org/indigo/eclipse/plugins/org.eclipse.help.base_3.6.0.v201106030909.jar org.eclipse.help.standalone.Infocenter -clean -command start -eclipsehome /home/data/httpd/help.eclipse.org/indigo/eclipse -port 8085 -nl en -locales en -plugincustomization /home/data/httpd/help.eclipse.org/indigo/eclipse/plugin_customization.ini &
(In reply to comment #20) > > What other start options are you setting? Hmm, I'm afraid it looks like that UA will have to prepare a fix which passes 'context.sessioninactiveinterval' when starting the server.
-1 is a pretty terrible default here. I've spent a bit of time going through the code and using -1 the only way to remove sessions is to shutdown the JettyServer instance. I'm game for setting a big default instead. Remember this is the default and for good or bad people may have got used to long sessions. Typically I would set this at 1 hour but I'll start the initial bidding at 6 hours -- any takers?
(In reply to comment #22) > Remember this is the default and for good or bad people may have got used to > long sessions. Typically I would set this at 1 hour but I'll start the initial > bidding at 6 hours -- any takers? I'm not sure. The default seems to be 30 minutes everywhere. I can imagine that only a few people recognized that the default isn't 30 for the Equinox Jetty HttpService.
and those being the most vocal that will cry out that it is the end of the world :)
It seems that we don't have a complete picture yet of why this is happening but we have some good theories. Based on the previous comment here's what I think we know so far: 1. The reason we are running out of memory is that we have over a million Session objects on the heap. 2. According to Comment 19 the default session timeout is '-1', i.e. never timeout, some previous comments had suggested that the timeout was 30 minutes. 3. Comment 16 suggested starting the infocenter with -Dorg.eclipse.equinox.http.jetty.context.sessioninactiveinterval=60 4. Comment 11 suggested using Jetty version 6.1.26 5. Comment 21 suggested that the help component set 'context.sessioninactiveinterval' when starting the server. 6. We don't understand why a million sessions would be created since there would no be that large a number of clients. I think the next step is to see if we can confirm some of these assertions and also find an answer to question 6. I will start debugging the help infocenter and server and track the creation of Session objects and see if I can induce them to get disposed by setting sessioninactiveinterval.
> 4. Comment 11 suggested using Jetty version 6.1.26 fwiw, if this is in fact because the session is immortal then updating to 6.1.26 will not fix that, that update will only fix things if it is an IllegalStateException being tossed when they are being invalidated. cheers, jesse
30 minutes -- are you nuts? I will accept 5 hours and 30 minutes and not one minute less. ...jk-- 30 minutes is fine. What do you all think? That's the timeout most app servers give by default.
I have an update based on spending some time in the debugger and talking with Simon Kaegi: The default timeout is -1 When starting an infocenter passing -Dorg.eclipse.equinox.http.jetty.context.sessioninactiveinterval=60 does not change anything. I think when the help system starts Jetty it bypasses the code that reads that argument. I can however change the timeout to 1 second by adding this line in the JettyConfigurator d.put(JettyConstants.CONTEXT_SESSIONINACTIVEINTERVAL, new Integer(1)); Obviously 1 second is too short, we need to decide what a better value would be. The help system itself does not do anything with sessions ( with one exception which could probably be coded away ) so there is a case to be made for setting a relatively short timeout such as 30 minutes. The only way I could see this causing a problem would be if an application which extended the help system by adding servlets wanted the Session to be preserved even if it have been inactive for more than 30 minutes.
Created attachment 201756 [details] Patch to set the timeout interval to 30 minutes
At this stage we have a patch in the help system to set the timeout to a lower value. Based on the previous discussion I believe this will solve the problem. The other question is do we want to try to get this into 3.7 SR1? My feeling is that we need to verify that the fix does indeed solve the original problem before considering for a service release. I'll see if I can work with a webmaster tomorrow to get a replacement bundle installed on the Eclipse server. How long did it take for the infocenter to run out of memory?
(In reply to comment #30) > I'll see if I can work with a webmaster tomorrow to get a replacement bundle > installed on the Eclipse server. How long did it take for the infocenter to run > out of memory? Couple of days... max. 1 week with 1024m.
enable jmx and monitor it for memory growth, that should show how its going instead of having to wait for it to crash jesse
Created attachment 201817 [details] jar which sets session inactive timeout to 30 minutes This is a replacement jar file built with Patch to set the timeout interval to 30 minutes. Here are the instructions for deployment. 1. Go to eclipse/plugins in the directory where the 3.7 infocenter resides. 2. locate the org.eclipse.help.base plugin and move it out of the plugins directory to a safe place. 3. Take this attachment and move it to an empty folder 4. Rename this attached jar to have exactly the same name as the version of org.eclipse.help.base as the one which was there originally. 5. Copy the jar file you just renamed to the plugins directory 6. Start the infocenter 7. Wait a few days I apologize for the convoluted instructions but this is the sequence of steps that reliably works.
Thanks Chris. Anything that doesn't involve hacking assembler is a walk in the park. I'll let you know how it works.
Created attachment 201824 [details] Results of help centre startup
After following the instructions the help centre 'fails' in about 30s, no heap or stack trace. I've reverted for now. -M.
"Eclipse exited with status code 13" indicates that Eclipse should be restarted. Exit code 13 is the secret return code for the launcher to re-start the Java process.
(In reply to comment #37) > "Eclipse exited with status code 13" indicates that Eclipse should be > restarted. Exit code 13 is the secret return code for the launcher to re-start > the Java process. Ok, but everytime I restarted it, it did the same thing. Based on something Denis just suggested I checked the patch jar and it appears to be unsigned. Could that be part of the problem? -M.
(In reply to comment #38) > Based on something Denis just suggested I checked the patch jar and it appears > to be unsigned. Could that be part of the problem? Don't think so. It may have something to do with a version change of the jar. Even if the name is the same, the version within the JAR might be different. Thus, the framework does some initialization and wants to restart. Chris, can the webmasters try a different I or M build with the fix?
Chris, Is the bundle version (in the manifest) and file name exactly the same as the old file? It has to be if you want to "replace" the existing jar file without using a p2 operation. -- My opinion is use a real build instead of this patching approach...
I've just tried the patch jar but without re-naming it(I also updated our startup script), and now it seems to have started without issue. -M.
The fix is not yet available in an M-build, I-build or N-build. I wanted to test the fix before putting it into a maintenance build. I think that there should be some P2 friendly way of replacing a single bundle but I don't know what that is. There is not an I-build scheduled until next Tuesday and I was hoping to get some testing in before then. Does anyone know of a better way to update just one bundle without having to wait for a build?
(In reply to comment #42) > There is not an I-build scheduled until next Tuesday and I was hoping to get > some testing in before then. Does anyone know of a better way to update just > one bundle without having to wait for a build? I think a patched feature would do it. We did something similar to patch jdt.core and other jdt bundles to add the Java 7 support.
(In reply to comment #41) > I've just tried the patch jar but without re-naming it(I also updated our > startup script), and now it seems to have started without issue. > > -M. If the 3.7 jar is still there in addition to the new one it will use the 3.7 version of the jar file and not the new one.
Matt, don't forget to disable the midnight cronjob on pebbles that kicks the Indigo help.
Ok, now I'm really confused. I followed the instructions in comment 33 and it didn't work. Then after seeing comment 39 I tried the replacement again but without changing the 'new' jars name. And it seemed to be happy(after I updated the startup script). Then when I checked a few minutes ago the process was gone, and trying to restart it generated the same 'status code 13'. I've once again reverted things. -M.
Created attachment 201833 [details] Version 2 of jar which sets session inactive timeout to 30 minutes It was the version number that caused the failure last time. I made the mistake of testing on a 3.8 build rather than against 3.7. Can you try one more time with the attached jar and the same instructions as before. I have tested this myself on a 3.7 infocenter and the infocenter does start and does use the new bundle. You will be able to tell if the new version is installed by opening http://host/context/about.html ( i.e. http://help.eclipse.org/indigo/about.html ), which will show all the bundle versions, org.eclipse.help.base will have a version of 3.6.0.201108191228 if the attached jar is installed. If this does not work I will look into creating a feature patch.
I have the same issue with the new jar. On the plus side I understand that it wasn't working previously either, I'd just turned the debug option off and didn't wait long enough. We're back to the original setup. -M.
(In reply to comment #42) > The fix is not yet available in an M-build, I-build or N-build. I wanted to > test the fix before putting it into a maintenance build. I think that there > should be some P2 friendly way of replacing a single bundle but I don't know > what that is. Ensure the bundle has the same three part version number as the one in 3.7. Then do File > Export > Deployable plugins and fragments. On the Options tab, select "Qualifier Replacement", and enter the qualifier that was used in R3.7.0. Click Finish. You should now have a jar with the exact same name as the one that was deployed in R3.7. You can now "swap" this jar in the plugins directory without any renames, and it should get picked up on restart.
The version replacement sounds like the way to go. I will not be able to get to this until this evening as it is the day the UA sources migrate to Git.
Created attachment 201955 [details] Version 3 of jar This jar file has the same name and version so it should be a direct replacement. Please try this version to see if it stays up. If the infocenter crashes please send a pointer to the core dump so I can try to figure out what is happening.
I'm still getting the 'status code 13' message. There is no core or heap file for me to provide, the updated jars simply 'fail' to start(or fail quietly). The version on our help jar is: 3.6.0.v201106030909 , are we perhaps running an 'non final' RC build? -M.
You probably are running a version of Eclipse that was not the final version. org.eclipse.help.base did not change but it's version number did, it is one of a handful of plug-ins in Eclipse whose version qualifier changes with every build ( because it is a branding plug-in). It's highly unlikely that anything in Eclipse changed between June 3 and June 13 2011 that would affect the infocenter. I will fire up another patch to match your version.
Created attachment 202013 [details] Version 4 of jar Let's try one more time, this has the version of v201106030909.
Ok that seems to have started correctly and is presently running. I've stopped the nightly restart. -M.
How in the help.eclipse.org infocenter doing, is it still up?
So far so good. I don't see any heap files, and the site has responded when I've checked randomly. If it makes it a few more days I'd say we've fixed the problem. -M.
I wonder if Orion is affected by the same issue of it can be worked around by specify the system property.
re: Orion Indeed. We're supplying a timeout now.
(In reply to comment #57) > So far so good. I don't see any heap files, and the site has responded when > I've checked randomly. If it makes it a few more days I'd say we've fixed the > problem. > > -M. Did it make it a few more days?
Looks like. Lets close this bug an we can re-open if something explodes. -M.
We can't set the state to fixed until the fix appears in a release of Eclipse so I am reopening the bug. I think that we should put the fix into 3.7.2, it is too late to add to 3.7.1. I also think that the fix should go into Juno ( 3.8/4.2 ) even though the problem may go away with a newer version of Jetty.
if I remember right, the issue was that the sessions were being declared immortal and would never be removed that was the issue, which when configured that way jetty will happily let you eventually doom yourself.
(In reply to comment #63) That is my understanding also.
Simon, I think that this fix should go into Eclipse 3.7.2. If you agree can you review and approve "Patch to set the timeout interval to 30 minutes" for inclusion in 3.7.2?
re: 3.7.2 +1 It's safe
I have cherry picked the fix from master to the R3_7_maintenance branch, updated the bundle version, adjusted the api filters for the bundle version and updated the map file. Fixed.
So, for my information ... is Eclipse.org now running 3.7.1 + patch ? Or a recent 3.7.2 build ?
(In reply to comment #68) Were running SR0(3.7.0?) + the patched help jar from Chris. -M.