| Summary: | FacetProjectWorkingCopy creating many threads..... | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [WebTools] WTP Common Tools | Reporter: | Chuck Bridgham <cbridgha> | ||||||
| Component: | Faceted Project Framework | Assignee: | Konstantin Komissarchik <konstantin> | ||||||
| Status: | RESOLVED FIXED | QA Contact: | |||||||
| Severity: | normal | ||||||||
| Priority: | P3 | CC: | ccc, deboer, mbaker | ||||||
| Version: | 3.0.3 | ||||||||
| Target Milestone: | 3.1 M6 | ||||||||
| Hardware: | PC | ||||||||
| OS: | Windows XP | ||||||||
| Whiteboard: | |||||||||
| Attachments: |
|
||||||||
|
Description
Chuck Bridgham
Created attachment 115431 [details]
possible patch
Using a single "Job" instance for the refresh, I no longer run into this issue... Is this a possible solution? Raising this as a hotbug request, we need resolution on this quickly I do not understand the problem scenario here. Do you have a repro? You mention a deadlock, but you didn't attach any thread dumps. I haven't looked at a patch, but using a Job is not necessarily better than using separate threads. It all depends on specifics. Specifics are also necessary to justify the severity and hotbug request. Created attachment 115592 [details]
Stack dump
Hi - Sorry, I didn't go into more detail.... We have an adopter migration routine, that will detect, and install addition facets to projects if needed. What we are seeing, is the creation of additional Threads over time, while the migration routine is executing, and they block each other. (looks like the RuntimeManager listener is being called several times a second, creating a new thread each time...) My suggestion for a "single" Job instance was only to prevent the creation of many threads, and the convenience of the Job api queuing up execution requests on the single instance. A single Thread instance could be used to do the same thing, although would need to explicitly call join on the running thread instance. Ok. I think I understand the scenario, but I think the true cause of the problem you are seeing is elsewhere. Making threading changes to FacetedProjectWorkingCopy would just mask the problem. The key question is what's going on that causes this high rate of events to be generated in the first place. The RuntimeManager only generates these events if it believes that the list of runtimes has changed. Looking at the thread dump, dozens of these events have been generated while the working copy was locked. The question is why? I would guess that this is caused by the runtime bridge. I recommend putting a breakpoint at RuntimeManagerImpl.bridge() method as a way to determine why it thinks the list of runtimes is changing. Thanks, I agree some thread contention is going on here besides... and we are pursuing this, but I still think a "single" thread could avoid any out of memory issue. I'll remove the hotbug request... but leave as major for 303. Ok. Let me know what you find out. Changing threading behavior in a service release is rather risky, so I want to make sure that we understand the problem completely before attempting any changes. ok we have solved some related issues, but come back to this....
Thread [Thread-9] (Suspended)
FacetedProjectWorkingCopy$1.handle() line: 228
RuntimeManagerImpl.notifyRuntimeListeners() line: 283
RuntimeManagerImpl.bridge() line: 437
RuntimeManagerImpl.getRuntimes() line: 156 [local variables unavailable]
RuntimeManager.getRuntimes() line: 80 [local variables unavailable]
RuntimeManagerImpl$RuntimeValidationThread.run() line: 1189
Here is an example thread that spawns a new thread because when it goes through the bridge() call, the flag is changed to modified = true because of these blocks:
// Remove the absolete entries.
for( BridgedRuntime r : existing.values() )
{
if( ! exported.contains( r.getNativeRuntimeId() ) )
{
runtimes.delete( r.getName() );
modified = true;
}
}
and
// Create the new entries.
for( String id : exported )
{
if( ! existing.containsKey( id ) )
{
try
{
final IRuntimeBridge.IStub stub = br.bridge( id );
final BridgedRuntime r
= new BridgedRuntime( brid, id, stub );
r.setName( createUniqueRuntimeName( id ) );
runtimes.add( r.getName(), r );
modified = true;
}
catch( CoreException e )
{
FacetCorePlugin.log( e );
}
}
}
It falls into either of these blocks, and generates another thread because of the notification.
Any idea whats going on here?
That code should only be setting the modified flag if the list of runtimes has changed. Is there anything changing the runtimes in your scenario? If not, do you mind stepping into those blocks of code to figure out why the modified flag is being set? We verified adopter code was adding a server vm, but then had problems during install, and then removed.... All of the fiddling with the vm list caused the notifications.... This is no longer a stop-ship for us, but I still changing this design for a single threaded request queue would be best in the long run... Ok. Lowering the severity and adding to the 3.1 queue. According to my understanding of the problem, I don't believe there is any relationship to Bug 225207 so removing the dependency. Please correct me if I missed something.
There is a bug in this code that I believe causes all the threads to be created.
In this method the bit that looks like
runtimes.contains( name)
should instead be
runtimes.containsKey( name )
RuntimeManagerImpl
private static String createUniqueRuntimeName( final String suggestion )
{
String name = suggestion;
for( int i = 1; runtimes.contains( name ); i++ )
{
name = suggestion + " (" + i + ")"; //$NON-NLS-1$ //$NON-NLS-2$
}
return name;
}
When the bridge code posted above gets called it tries to generate a unique name for a runtime returned by a bridge. When the bridge returns a name that already exists and the createUniqueRuntimeName() method is called it ALWAYS fails and returns the same name.
After the bridged runtimes are added, this whole thing gets called once more since the modified flag is set and fires the refresh event, but then fails again because this line in the bridge() method:
if( ! existing.containsKey( id ) )
fails because unique names are not unique due to the failure of createUniqueName()
This causes the the call back into the bridge to fetch the bridged runtimes again which then once again resets the modified flag which fires the event which refreshes the runtimes....
I could be wrong in my analysis. But the above method is broken either way.
Thanks, Matt. I don't know if the problem that you describe in Comment #15 is related to the original problem reported here, but that code is clearly wrong. I released a fix to correct the problem that Matt has identified and changed the available runtimes refresh thread to a Job like Chuck has suggested. Changed released to 3.1 M6 code stream. |