Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 358301

Summary: [DSTORE] Hang during debug source look up
Product: [Tools] Target Management Reporter: Samuel Wu <samuelwu>
Component: RSEAssignee: David McKnight <dmcknigh>
Status: REOPENED --- QA Contact: Martin Oberhuber <mober.at+eclipse>
Severity: normal    
Priority: P3 CC: dmcknigh
Version: unspecified   
Target Milestone: 3.4   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on:    
Bug Blocks: 361000    
Attachments:
Description Flags
Stack trace during when UI was locked
none
patch to synchronize on maps
none
Stacktrace
none
patch to check for outofmemory errors
none
additional patch to deal with other out of memory cases
none
a couple more cases none

Description Samuel Wu CLA 2011-09-20 16:19:07 EDT
Build Identifier: RSE 3.2 maintenance (RSE-runtime-M20110601-1650.zip)

When the source file was not found for a debug session, the user ran the action Change Text File to find a file on a remote host, the UI was locked up

Reproducible: Sometimes

Steps to Reproduce:
1. Set the source look up to Default and start a debug session
2. In the Source_Not_Found debug editor, run the action Change Text File 
3. Select a remote host and expand its Root directory
4. The UI was locked up
Will attach the stacktrace
Comment 1 Samuel Wu CLA 2011-09-20 16:24:36 EDT
Created attachment 203711 [details]
Stack trace during when UI was locked

A few traces were taken and they all contain the following.

Thread[ModalContext,RUNNABLE,118]
		java.util.HashMap.findNonNullKeyEntry(HashMap.java:525)
		java.util.HashMap.putImpl(HashMap.java:622)
		java.util.HashMap.put(HashMap.java:605)
		org.eclipse.rse.internal.services.dstore.files.DStoreFileService.convertToHostFile(DStoreFileService.java:1375)
		org.eclipse.rse.internal.services.dstore.files.DStoreFileService.convertToHostFiles(DStoreFileService.java:1401)
		org.eclipse.rse.internal.services.dstore.files.DStoreFileService.fetch(DStoreFileService.java:2170)

It looks that the call never returned and the UI was waiting for it.
Comment 2 Samuel Wu CLA 2011-09-20 16:28:04 EDT
The following trace was from another case which never returned as well. Since it was on a non-GUI thread, the UI was not blocked.

Thread[Worker-8,RUNNABLE,40]
		java.util.HashMap.findNonNullKeyEntry(Unknown Source)
		java.util.HashMap.getEntry(Unknown Source)
		java.util.HashMap.containsKey(Unknown Source)
		org.eclipse.rse.subsystems.files.core.subsystems.RemoteFileSubSystem.cacheRemoteFile(RemoteFileSubSystem.java:1275)
		org.eclipse.rse.subsystems.files.core.subsystems.RemoteFileSubSystem.cacheRemoteFile(RemoteFileSubSystem.java:1313)
		org.eclipse.rse.internal.subsystems.files.local.model.LocalFileAdapter.convertToRemoteFiles(LocalFileAdapter.java:59)
		org.eclipse.rse.subsystems.files.core.servicesubsystem.FileServiceSubSystem.list(FileServiceSubSystem.java:578)
		org.eclipse.rse.subsystems.files.core.subsystems.RemoteFileSubSystem.list(RemoteFileSubSystem.java:976)
Comment 3 Samuel Wu CLA 2011-09-20 16:33:12 EDT
Similar problem.

Thread[Worker-0,RUNNABLE,18]
		java.util.HashMap.findNonNullKeyEntry(HashMap.java:526)
		java.util.HashMap.putImpl(HashMap.java:622)
		java.util.HashMap.put(HashMap.java:605)
		org.eclipse.rse.internal.services.dstore.files.DStoreFileService.convertToHostFile(DStoreFileService.java:1375)
		org.eclipse.rse.internal.services.dstore.files.DStoreFileService.convertToHostFiles(DStoreFileService.java:1401)
		org.eclipse.rse.internal.services.dstore.files.DStoreFileService.fetch(DStoreFileService.java:2170)
		org.eclipse.rse.internal.services.dstore.files.DStoreFileService.list(DStoreFileService.java:2030)
		org.eclipse.rse.subsystems.files.core.servicesubsystem.FileServiceSubSystem.internalList(FileServiceSubSystem.java:379)
		org.eclipse.rse.subsystems.files.core.servicesubsystem.FileServiceSubSystem.list(FileServiceSubSystem.java:571)
		org.eclipse.rse.subsystems.files.core.subsystems.RemoteFileSubSystem.list(RemoteFileSubSystem.java:976)
Comment 4 Samuel Wu CLA 2011-09-20 16:49:35 EDT
The user ran into this problem was on RSE-runtime-M20110316-2215.zip and that worked fine for him.
Comment 5 David McKnight CLA 2011-09-22 10:57:39 EDT
Created attachment 203846 [details]
patch to synchronize on maps

Can you see if this patch helps?
Comment 6 Samuel Wu CLA 2011-09-22 16:20:43 EDT
Thank you for the patch, Dave. I can't actually reproduce the problem myself. I tried to do a source look up in a directory which contains a lot of files. And I got the following problem and the source look up didn't return.

Thread[Worker-11,TIMED_WAITING,47]
		java.lang.Object.wait(Native Method)
		java.lang.Object.wait(Object.java:196)
		org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:372)
		org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:288)
		org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:236)
		org.eclipse.rse.services.dstore.AbstractDStoreService.dsQueryCommand(AbstractDStoreService.java:129)
		org.eclipse.rse.internal.services.dstore.files.DStoreFileService.getFile(DStoreFileService.java:1270)
		org.eclipse.rse.subsystems.files.core.servicesubsystem.FileServiceSubSystem.updateRemoteFile(FileServiceSubSystem.java:594)
		org.eclipse.rse.subsystems.files.core.servicesubsystem.FileServiceSubSystem.list(FileServiceSubSystem.java:575)
		org.eclipse.rse.subsystems.files.core.subsystems.RemoteFileSubSystem.list(RemoteFileSubSystem.java:977)

The connection was still active and I tried to expand a filter in RSE. But it did return either.
Thread[Worker-13,TIMED_WAITING,90]
		java.lang.Object.wait(Native Method)
		java.lang.Object.wait(Object.java:196)
		org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:372)
		org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:288)
		org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:236)
		org.eclipse.rse.services.dstore.AbstractDStoreService.dsQueryCommand(AbstractDStoreService.java:129)
		org.eclipse.rse.internal.services.dstore.files.DStoreFileService.fetch(DStoreFileService.java:2187)

I then tried to expand root and it didn't return.
		java.lang.Object.wait(Native Method)
		java.lang.Object.wait(Object.java:196)
		org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:372)
		org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:288)
		org.eclipse.rse.services.dstore.util.DStoreStatusMonitor.waitForUpdate(DStoreStatusMonitor.java:236)
		org.eclipse.rse.services.dstore.AbstractDStoreService.dsQueryCommand(AbstractDStoreService.java:129)
		org.eclipse.rse.services.dstore.AbstractDStoreService.dsQueryCommand(AbstractDStoreService.java:97)
		org.eclipse.rse.internal.services.dstore.files.DStoreFileService.getRoots(DStoreFileService.java:1986)
		org.eclipse.rse.subsystems.files.core.servicesubsystem.FileServiceSubSystem.getRoots(FileServiceSubSystem.java:389)

Something seems to be wrong with the server. 

I'll attach the stack trace
Comment 7 Samuel Wu CLA 2011-09-22 16:21:26 EDT
Created attachment 203865 [details]
Stacktrace
Comment 8 David McKnight CLA 2011-09-23 10:47:10 EDT
Samuel, do you see that stacks you're hitting as the same problem as the one your customer hit?  Is there a way to reproduce this from pure RSE (i.e. without your source lookup mechanism)?
Comment 9 Samuel Wu CLA 2011-09-23 12:17:42 EDT
Hi Dave,
When I tried the Remote Folder source look up with the same directory, it simply returned quickly with the source not found message. But the source file was in a subdirectory of the remote folder. That's why the customer switch to the source look up of our own.
I also did a file search on the same file in the same directory in RSE. It ended up with a connection drop. I didn't see the out of memor message on the server side when I let the server to launch in the foreground.
Comment 10 David McKnight CLA 2011-09-26 13:41:07 EDT
(In reply to comment #9)
> Hi Dave,
> When I tried the Remote Folder source look up with the same directory, it
> simply returned quickly with the source not found message. But the source file
> was in a subdirectory of the remote folder. That's why the customer switch to
> the source look up of our own.
> I also did a file search on the same file in the same directory in RSE. It
> ended up with a connection drop. I didn't see the out of memor message on the
> server side when I let the server to launch in the foreground.

Samuel, it looks like you're described a few different problems.  I'm not sure whether this bug is the place each of these issues.  For whatever is reproducible via RSE, could you provide me with a environment that I could use to hit the problem?
Comment 11 Samuel Wu CLA 2011-10-14 11:23:34 EDT
A bit of further investigation shows that a possible cause of the problem is that the RSE server had run out of memory but the RSE connection didn't drop.  When the user tried to get anything from the GUI thread, it locked up the GUI.

We may want to terminate the dstore server once it runs out of memory.
Comment 12 David McKnight CLA 2011-10-14 11:50:27 EDT
Created attachment 205210 [details]
patch to check for outofmemory errors
Comment 13 David McKnight CLA 2011-10-14 11:52:25 EDT
The attached patch will detect out of memory errors and attempt exit.  It's still possible that in some of those cases, an out of memory error will be hit during the exit.  I've committed the patch to the HEAD stream.  

Do you need this backported?
Comment 14 Samuel Wu CLA 2011-10-14 12:12:51 EDT
Bug 361000 was opened for backporting. Thanks.
Comment 15 David McKnight CLA 2011-10-19 14:31:42 EDT
Created attachment 205555 [details]
additional patch to deal with other out of memory cases
Comment 16 David McKnight CLA 2011-11-07 10:48:55 EST
There are a couple more cases that can be handled.
Comment 17 David McKnight CLA 2011-11-07 10:49:10 EST
Created attachment 206528 [details]
a couple more cases
Comment 18 David McKnight CLA 2011-11-07 10:50:21 EST
I committed the updated patch.
Comment 19 Martin Oberhuber CLA 2011-11-25 05:25:40 EST
Catching away the OutOfMemoryError seems an odd way handling this.

Here are a couple thoughts:

1.) Has it ever been analyzed why the OOME occurs ? The Eclipse Memory Analyzer
    makes it fairly easy to analyze a heap dump. On an Oracle VM, just launch
    with "-vmargs -XX:+HeapDumpOnOutOfMemoryError". For other VM's see
    http://wiki.eclipse.org/MemoryAnalyzer#Getting_a_Heap_Dump

2.) OutOfMemoryError is a subclass of "Error" for which the Java API Docs say:
       "An Error is a subclass of Throwable that indicates serious problems 
        that a reasonable application should not try to catch."
    My understanding is that an OOME should terminate the app automatically.
    Unless the dstore sever catches Error or Throwable somewhere else ?
    I suggest checking whether dstore server catches away errors, since it
    shouldn't do that. Similar errors (eg ThreadDeatch) would otherwise likely
    lead to the same problem we see here.

3.) Note that some VM's allow running a command on OutOfMemoryError. This could
    eg be used to re-start the server ... see -
       -XX:OnError="<cmd args>;<cmd args>"
       -XX:OnOutOfMemoryError="<cmd args>;<cmd args>"
    here:
       http://www.oracle.com/technetwork/java/javase/tech/vmoptions-jsp-140102.html
    This seems to make more sense than just do a hardcoded exit...

I'm not going to enforce any of these suggestions (Wind River doesn't use dstore) but I'll reopen the bug for comments. Feel free to mark closed again if you think these comments are bogus.
Comment 20 Martin Oberhuber CLA 2012-05-23 19:52:19 EDT
REOPENED doesn't look like a proper state for this, can you look at some of my thoughts and comment ?