| Summary: | [DSTORE] Hang during debug source look up | ||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | [Tools] Target Management | Reporter: | Samuel Wu <samuelwu> | ||||||||||||||
| Component: | RSE | Assignee: | David McKnight <dmcknigh> | ||||||||||||||
| Status: | REOPENED --- | QA Contact: | Martin Oberhuber <mober.at+eclipse> | ||||||||||||||
| Severity: | normal | ||||||||||||||||
| Priority: | P3 | CC: | dmcknigh | ||||||||||||||
| Version: | unspecified | ||||||||||||||||
| Target Milestone: | 3.4 | ||||||||||||||||
| Hardware: | PC | ||||||||||||||||
| OS: | Windows XP | ||||||||||||||||
| Whiteboard: | |||||||||||||||||
| Bug Depends on: | |||||||||||||||||
| Bug Blocks: | 361000 | ||||||||||||||||
| Attachments: |
|
||||||||||||||||
|
Description
Samuel Wu
Created attachment 203711 [details]
Stack trace during when UI was locked
A few traces were taken and they all contain the following.
Thread[ModalContext,RUNNABLE,118]
java.util.HashMap.findNonNullKeyEntry(HashMap.java:525)
java.util.HashMap.putImpl(HashMap.java:622)
java.util.HashMap.put(HashMap.java:605)
org.eclipse.rse.internal.services.dstore.files.DStoreFileService.convertToHostFile(DStoreFileService.java:1375)
org.eclipse.rse.internal.services.dstore.files.DStoreFileService.convertToHostFiles(DStoreFileService.java:1401)
org.eclipse.rse.internal.services.dstore.files.DStoreFileService.fetch(DStoreFileService.java:2170)
It looks that the call never returned and the UI was waiting for it.
The following trace was from another case which never returned as well. Since it was on a non-GUI thread, the UI was not blocked. Thread[Worker-8,RUNNABLE,40] java.util.HashMap.findNonNullKeyEntry(Unknown Source) java.util.HashMap.getEntry(Unknown Source) java.util.HashMap.containsKey(Unknown Source) org.eclipse.rse.subsystems.files.core.subsystems.RemoteFileSubSystem.cacheRemoteFile(RemoteFileSubSystem.java:1275) org.eclipse.rse.subsystems.files.core.subsystems.RemoteFileSubSystem.cacheRemoteFile(RemoteFileSubSystem.java:1313) org.eclipse.rse.internal.subsystems.files.local.model.LocalFileAdapter.convertToRemoteFiles(LocalFileAdapter.java:59) org.eclipse.rse.subsystems.files.core.servicesubsystem.FileServiceSubSystem.list(FileServiceSubSystem.java:578) org.eclipse.rse.subsystems.files.core.subsystems.RemoteFileSubSystem.list(RemoteFileSubSystem.java:976) Similar problem. Thread[Worker-0,RUNNABLE,18] java.util.HashMap.findNonNullKeyEntry(HashMap.java:526) java.util.HashMap.putImpl(HashMap.java:622) java.util.HashMap.put(HashMap.java:605) org.eclipse.rse.internal.services.dstore.files.DStoreFileService.convertToHostFile(DStoreFileService.java:1375) org.eclipse.rse.internal.services.dstore.files.DStoreFileService.convertToHostFiles(DStoreFileService.java:1401) org.eclipse.rse.internal.services.dstore.files.DStoreFileService.fetch(DStoreFileService.java:2170) org.eclipse.rse.internal.services.dstore.files.DStoreFileService.list(DStoreFileService.java:2030) org.eclipse.rse.subsystems.files.core.servicesubsystem.FileServiceSubSystem.internalList(FileServiceSubSystem.java:379) org.eclipse.rse.subsystems.files.core.servicesubsystem.FileServiceSubSystem.list(FileServiceSubSystem.java:571) org.eclipse.rse.subsystems.files.core.subsystems.RemoteFileSubSystem.list(RemoteFileSubSystem.java:976) The user ran into this problem was on RSE-runtime-M20110316-2215.zip and that worked fine for him. Created attachment 203846 [details]
patch to synchronize on maps
Can you see if this patch helps?
Thank you for the patch, Dave. I can't actually reproduce the problem myself. I tried to do a source look up in a directory which contains a lot of files. And I got the following problem and the source look up didn't return. Thread[Worker-11,TIMED_WAITING,47] java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:196) org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:372) org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:288) org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:236) org.eclipse.rse.services.dstore.AbstractDStoreService.dsQueryCommand(AbstractDStoreService.java:129) org.eclipse.rse.internal.services.dstore.files.DStoreFileService.getFile(DStoreFileService.java:1270) org.eclipse.rse.subsystems.files.core.servicesubsystem.FileServiceSubSystem.updateRemoteFile(FileServiceSubSystem.java:594) org.eclipse.rse.subsystems.files.core.servicesubsystem.FileServiceSubSystem.list(FileServiceSubSystem.java:575) org.eclipse.rse.subsystems.files.core.subsystems.RemoteFileSubSystem.list(RemoteFileSubSystem.java:977) The connection was still active and I tried to expand a filter in RSE. But it did return either. Thread[Worker-13,TIMED_WAITING,90] java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:196) org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:372) org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:288) org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:236) org.eclipse.rse.services.dstore.AbstractDStoreService.dsQueryCommand(AbstractDStoreService.java:129) org.eclipse.rse.internal.services.dstore.files.DStoreFileService.fetch(DStoreFileService.java:2187) I then tried to expand root and it didn't return. java.lang.Object.wait(Native Method) java.lang.Object.wait(Object.java:196) org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:372) org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:288) org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:236) org.eclipse.rse.services.dstore.AbstractDStoreService.dsQueryCommand(AbstractDStoreService.java:129) org.eclipse.rse.services.dstore.AbstractDStoreService.dsQueryCommand(AbstractDStoreService.java:97) org.eclipse.rse.internal.services.dstore.files.DStoreFileService.getRoots(DStoreFileService.java:1986) org.eclipse.rse.subsystems.files.core.servicesubsystem.FileServiceSubSystem.getRoots(FileServiceSubSystem.java:389) Something seems to be wrong with the server. I'll attach the stack trace Created attachment 203865 [details]
Stacktrace
Samuel, do you see that stacks you're hitting as the same problem as the one your customer hit? Is there a way to reproduce this from pure RSE (i.e. without your source lookup mechanism)? Hi Dave, When I tried the Remote Folder source look up with the same directory, it simply returned quickly with the source not found message. But the source file was in a subdirectory of the remote folder. That's why the customer switch to the source look up of our own. I also did a file search on the same file in the same directory in RSE. It ended up with a connection drop. I didn't see the out of memor message on the server side when I let the server to launch in the foreground. (In reply to comment #9) > Hi Dave, > When I tried the Remote Folder source look up with the same directory, it > simply returned quickly with the source not found message. But the source file > was in a subdirectory of the remote folder. That's why the customer switch to > the source look up of our own. > I also did a file search on the same file in the same directory in RSE. It > ended up with a connection drop. I didn't see the out of memor message on the > server side when I let the server to launch in the foreground. Samuel, it looks like you're described a few different problems. I'm not sure whether this bug is the place each of these issues. For whatever is reproducible via RSE, could you provide me with a environment that I could use to hit the problem? A bit of further investigation shows that a possible cause of the problem is that the RSE server had run out of memory but the RSE connection didn't drop. When the user tried to get anything from the GUI thread, it locked up the GUI. We may want to terminate the dstore server once it runs out of memory. Created attachment 205210 [details]
patch to check for outofmemory errors
The attached patch will detect out of memory errors and attempt exit. It's still possible that in some of those cases, an out of memory error will be hit during the exit. I've committed the patch to the HEAD stream. Do you need this backported? Bug 361000 was opened for backporting. Thanks. Created attachment 205555 [details]
additional patch to deal with other out of memory cases
There are a couple more cases that can be handled. Created attachment 206528 [details]
a couple more cases
I committed the updated patch. Catching away the OutOfMemoryError seems an odd way handling this.
Here are a couple thoughts:
1.) Has it ever been analyzed why the OOME occurs ? The Eclipse Memory Analyzer
makes it fairly easy to analyze a heap dump. On an Oracle VM, just launch
with "-vmargs -XX:+HeapDumpOnOutOfMemoryError". For other VM's see
http://wiki.eclipse.org/MemoryAnalyzer#Getting_a_Heap_Dump
2.) OutOfMemoryError is a subclass of "Error" for which the Java API Docs say:
"An Error is a subclass of Throwable that indicates serious problems
that a reasonable application should not try to catch."
My understanding is that an OOME should terminate the app automatically.
Unless the dstore sever catches Error or Throwable somewhere else ?
I suggest checking whether dstore server catches away errors, since it
shouldn't do that. Similar errors (eg ThreadDeatch) would otherwise likely
lead to the same problem we see here.
3.) Note that some VM's allow running a command on OutOfMemoryError. This could
eg be used to re-start the server ... see -
-XX:OnError="<cmd args>;<cmd args>"
-XX:OnOutOfMemoryError="<cmd args>;<cmd args>"
here:
http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
This seems to make more sense than just do a hardcoded exit...
I'm not going to enforce any of these suggestions (Wind River doesn't use dstore) but I'll reopen the bug for comments. Feel free to mark closed again if you think these comments are bogus.
REOPENED doesn't look like a proper state for this, can you look at some of my thoughts and comment ? |