Community
Participate
Working Groups
This problem happens if the PTP runtime has not started before opening an existing or new parallel launch configuration. When the run configuration is opened, the PTP runtime is (correctly) started by a call to refreshRuntimeSystems(). However, it appears that this is happening too late for the machines list in the parallel tab to be populated correctly, since it is empty.
Is this supposed to be populated with the name that appears in Machine->Node Info View, not Machine0?
No, Machine0 is probably correct. It should be whatever machines are known by the model. The only other drop down like this is on the menu bar of the machines view.
The events causing the problem are (starting from ParallelTab.createControl): 1. The parallel tab calls ModelManager.refreshRuntimeSystems 2. ModelManager.refreshRuntimeSystems starts up the Proxy Client Event Thread - Proxy Client Event Thread receives the EVENT_RUNTIME_CONNECTED and EVENT_RUNTIME_OK events 3. ModelManager.refreshRuntimeSystems calls initiateDiscovery and finishes 4. ParallelTab calls universe.getSortedMachines while 5. The Proxy Client Event Thread waits for an event, gets EVENT_RUNTIME_NODEATTR, and updates the model. The bug occurs when 4 occurs before 5 completes (a race). Fix was done by waiting in ModelManager.refreshRuntimeSystems until the machines list is updated via EVENT_RUNTIME_NODEATTR. This is a bit of a hack as Eclipse will hang if machines list comes back empty. Need to revisit in version 2.0
There is another problem which maybe associated with this one. Assuming the Preference: PTP > OpenRTE > "Launch ORTE server manually" is unchecked, and the ORTE|PTP proexy server file is set correctly, the orted can not always be guaranteed to start. Here is the reason: THe PTPUIPlugin::refreshRuntimeSystem() is only called either when preference is changed or the ParallelTab or AbstractParallelSetView becomes visible. Case (NOT work): User start Eclipse with C/C++ perspective (assuming no AbstracParallelSetView visible), at this time, Eclipse won't start the orted automatically,i.e., PTPUIPlugin::refreshRuntimeSystem() is not called, so the orted is not started. Then user presses Ctrl+F11 to run a preconfigured program, PTP will popup a launch error dialog. The error is caused when Eclipse waiting for the universe to populate (which is never populated, since PTPUIPlugin::refreshRuntimeSystem() is never called before this step).
One solution may be to check if the orted is started each time starting a launch (by the launch delegate), if not, then call PTPUIPlugin::refreshRuntimeSystem() to start? Sorry if this is not related to the original defect. I could not find a right way to report a new defect for PTP since the Eclipse bugzilla seems un-friendly. :-(
Comments 4 & 5 refer to a known problem, that was thought to be resolved. I'm not sure which bug number. Are you using the latest 1.1 release? If so, then I need to have another look at this. PTP 2.0 will be handling this in a completely different way, but we need to get this problem resolved. Regards, Randy
I use 1.1 RC4, with Eclipse 3.2.2 + CDT 3.1.2, and openmpi-1.2rc2. BTW, what's the way 2.0 will handle this?
PTP 2.0 will be using resource managers more heavily. You will configure and startup a set of resource managers. These will be responsible for launching jobs. Resource managers will be configured based on the type of proxy they run, what cluster on which they ultimately run, and what the underlying actual resource manager is. For example, there will be separate resource managers for ORTE, and MPICH. Resource managers for LSF and Moab will be available for 2.0. We hope that others can contribute their own resource managers, via the extension point mechanism. You may have more than one resource manager of a given type running at the same time. You could have two LSF resource managers running at the same time, one using a cluster at Los Alamos, the other using a cluster at a university. Any resource managers that have been started when PTP is shutdown will be restarted automatically when PTP starts running. This happens when any PTP window is brought up, or when a PTP-based launch configuration is executed. If for some reason the specific resource manager associated with a particular launch configuration is not in an acceptable state, either has not been started, or in an error sate, then the launch configuration will fail. Any questions? Randy
Created attachment 61165 [details] Fixe the java.lang.reflect.InvocationTargetException when launch preference set to prompt Because of race condition, the universe could still be null when setting the Preference Run/Debug>Perspectives>Parallel Application to 'prompt'. The patch add check and will wait until universe popup or user cancelled.
Randy, thanks for your explaination. Hope I can get 2.0 soon. About #9, it is a patch for the race condition which happened when setting the launch preference to 'prompt' instead of 'always'. see greg's comments at: http://dev.eclipse.org/mhonarc/lists/ptp-user/msg00071.html Basically, the java.lang.reflect.InvocationTargetException is caused by NullPointerException in ModelManager::waitForPopulatedUniverse.
LATER/REMIND bugs are being automatically reopened as P5 because the LATER and REMIND resolutions are deprecated.
Please reopen if this is still a problem.