Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 163289 - Machines list in parallel launch configuation is empty
Summary: Machines list in parallel launch configuation is empty
Status: RESOLVED FIXED
Alias: None
Product: PTP
Classification: Tools
Component: Core (show other bugs)
Version: 1.1   Edit
Hardware: All All
: P5 normal (vote)
Target Milestone: ---   Edit
Assignee: Craig E Rasmussen CLA
QA Contact: Randy Roberts CLA
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2006-11-02 19:47 EST by Greg Watson CLA
Modified: 2011-01-31 07:49 EST (History)
2 users (show)

See Also:


Attachments
Fixe the java.lang.reflect.InvocationTargetException when launch preference set to prompt (891 bytes, patch)
2007-03-16 16:58 EDT, Chengdong Li CLA
g.watson: iplog+
Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Greg Watson CLA 2006-11-02 19:47:57 EST
This problem happens if the PTP runtime has not started before opening an existing or new parallel launch configuration. When the run configuration is opened, the PTP runtime is (correctly) started by a call to refreshRuntimeSystems(). However, it appears that this is happening too late for the machines list in the parallel tab to be populated correctly, since it is empty.
Comment 1 Craig E Rasmussen CLA 2006-11-03 14:41:17 EST
Is this supposed to be populated with the name that appears in Machine->Node Info View, not Machine0?
Comment 2 Greg Watson CLA 2006-11-03 17:46:56 EST
No, Machine0 is probably correct. It should be whatever machines are known by the model. The only other drop down like this is on the menu bar of the machines view.
Comment 3 Craig E Rasmussen CLA 2006-11-07 14:14:32 EST
 The events causing the problem are (starting from ParallelTab.createControl):

1. The parallel tab calls ModelManager.refreshRuntimeSystems
2. ModelManager.refreshRuntimeSystems starts up the Proxy Client Event Thread
       - Proxy Client Event Thread receives the EVENT_RUNTIME_CONNECTED and EVENT_RUNTIME_OK events
3. ModelManager.refreshRuntimeSystems calls initiateDiscovery and finishes
4. ParallelTab calls universe.getSortedMachines while
5. The Proxy Client Event Thread waits for an event, gets EVENT_RUNTIME_NODEATTR, and updates the model.

The bug occurs when 4 occurs before 5 completes (a race).

Fix was done by waiting in ModelManager.refreshRuntimeSystems until the machines list is updated via EVENT_RUNTIME_NODEATTR.  This is a bit of a hack as Eclipse will hang if machines list comes back empty.

Need to revisit in version 2.0
Comment 4 Chengdong Li CLA 2007-03-14 14:45:35 EDT
There is another problem which maybe associated with this one.

Assuming the Preference: PTP > OpenRTE > "Launch ORTE server manually" is unchecked, and the ORTE|PTP proexy server file is set correctly, the orted can not always be guaranteed to start.

Here is the reason:
THe PTPUIPlugin::refreshRuntimeSystem() is only called either when preference is changed or the ParallelTab or AbstractParallelSetView becomes visible.

Case (NOT work):
User start Eclipse with C/C++ perspective (assuming no AbstracParallelSetView visible), at this time, Eclipse won't start the orted automatically,i.e., PTPUIPlugin::refreshRuntimeSystem() is not called, so the orted is not started. 
Then user presses Ctrl+F11 to run a preconfigured program, PTP will popup a launch error dialog. The error is caused when Eclipse waiting for the universe to populate (which is never populated, since PTPUIPlugin::refreshRuntimeSystem() is never called before this step).
Comment 5 Chengdong Li CLA 2007-03-14 14:52:25 EDT
One solution may be to check if the orted is started each time starting a launch (by the launch delegate), if not, then call PTPUIPlugin::refreshRuntimeSystem() to start?

Sorry if this is not related to the original defect. I could not find a right way to report a new defect for PTP since the Eclipse bugzilla seems un-friendly. :-(
Comment 6 Randy Roberts CLA 2007-03-14 15:57:35 EDT
Comments 4 & 5 refer to a known problem, that was thought to be resolved.
I'm not sure which bug number.

Are you using the latest 1.1 release?  If so, then I need to have another
look at this.

PTP 2.0 will be handling this in a completely different way, but we need
to get this problem resolved.

Regards,
Randy
Comment 7 Chengdong Li CLA 2007-03-14 16:35:44 EDT
I use 1.1 RC4, with Eclipse 3.2.2 + CDT 3.1.2, and openmpi-1.2rc2.

BTW, what's the way 2.0 will handle this?
Comment 8 Randy Roberts CLA 2007-03-14 16:53:32 EDT
PTP 2.0 will be using resource managers more heavily.
You will configure and startup a set of resource managers.
These will be responsible for launching jobs.  Resource managers
will be configured based on the type of proxy they run, what cluster
on which they ultimately run, and what the underlying actual resource
manager is.  For example, there will be separate resource managers for
ORTE, and MPICH.  Resource managers for LSF and Moab will be available
for 2.0.  We hope that others can contribute their own resource managers,
via the extension point mechanism.

You may have more than one resource manager of a given type running at
the same time.  You could have two LSF resource managers running at the
same time, one using a cluster at Los Alamos, the other using a cluster
at a university.

Any resource managers that have been started when PTP is
shutdown will be restarted automatically when PTP starts
running.  This happens when any PTP window is brought up,
or when a PTP-based launch configuration is executed.

If for some reason the specific resource manager associated with
a particular launch configuration is not in an acceptable state, either
has not been started, or in an error sate, then the launch configuration
will fail.

Any questions?

Randy
Comment 9 Chengdong Li CLA 2007-03-16 16:58:55 EDT
Created attachment 61165 [details]
Fixe the java.lang.reflect.InvocationTargetException when launch preference set to prompt

Because of race condition, the universe could still be null when setting the Preference Run/Debug>Perspectives>Parallel Application to 'prompt'.

The patch add check and will wait until universe popup or user cancelled.
Comment 10 Chengdong Li CLA 2007-03-16 17:04:31 EDT
Randy, thanks for your explaination. Hope I can get 2.0 soon.

About #9, it is a patch for the race condition which happened when setting the launch preference to 'prompt' instead of 'always'. see greg's comments at: 
 http://dev.eclipse.org/mhonarc/lists/ptp-user/msg00071.html 

Basically, the java.lang.reflect.InvocationTargetException is caused by NullPointerException in ModelManager::waitForPopulatedUniverse.

Comment 11 Eclipse Webmaster CLA 2009-08-30 02:48:33 EDT
LATER/REMIND bugs are being automatically reopened as P5 because the LATER and REMIND resolutions are deprecated.
Comment 12 Greg Watson CLA 2009-09-16 10:56:46 EDT
Please reopen if this is still a problem.