| Summary: | Launch Error, cannot create routing file, unable to determine process location | ||||||
|---|---|---|---|---|---|---|---|
| Product: | [Tools] PTP | Reporter: | chuan.bai | ||||
| Component: | RM.MPICH2 | Assignee: | Project Inbox <ptp-inbox> | ||||
| Status: | RESOLVED FIXED | QA Contact: | |||||
| Severity: | major | ||||||
| Priority: | P3 | CC: | chuan.bai, incongruous, swatchpuppy | ||||
| Version: | 4.0 | ||||||
| Target Milestone: | 4.0.3 | ||||||
| Hardware: | Macintosh | ||||||
| OS: | Mac OS X - Carbon (unsup.) | ||||||
| Whiteboard: | |||||||
| Attachments: |
|
||||||
|
Description
chuan.bai
Created attachment 170558 [details]
The binary version of the core with this modification.
Hello,
This error is not restricted to Macs, i'm working on a sony, with ubuntu 10.04 and have the same problem, also, it's not caused by mpdlistjobs output changes from the mpich2-1.2.1p1.
Aparently there it is needed some time for the process.getNode() method to return something diferent than NULL, so i solved it the following way (it's not a solution because it can bring a lot of problems, but it works for me(so far)):
private void writeRoutingFile(IPLaunch launch) throws CoreException {
DebugUtil.trace(DebugUtil.SDM_MASTER_TRACING, Messages.SDMDebugger_12);
IProgressMonitor monitor = new NullProgressMonitor();
OutputStream os = null;
try {
os = fRoutingFileStore.openOutputStream(0, monitor);
} catch (CoreException e) {
throw newCoreException(e.getLocalizedMessage());
}
PrintWriter pw = new PrintWriter(os);
IPProcess processes[] = launch.getPJob().getProcesses();
pw.format("%d\n", processes.length); //$NON-NLS-1$
int base = 50000;
int range = 10000;
Random random = new Random();
for (IPProcess process : processes) {
String index = process.getProcessIndex();
/*
* For make shure that all processes can getNode()
*/
while (process.getNode() == null){}
IPNode node = process.getNode();
if (node != null) {
String nodeName = node.getName();
int portNumber = base + random.nextInt(range);
pw.format("%s %s %d\n", index, nodeName, portNumber); //$NON-NLS-1$
} else {
throw newCoreException(Messages.SDMDebugger_15);
}
}
pw.close();
try {
os.close();
} catch (IOException e) {
throw newCoreException(e.getLocalizedMessage());
}
}
I've got a little bit more time to think it trough, i believe that the thread that it's trying to access the nodes info, it's doing it earlier than the thread that it's writing the nodes pointer onto the array of nodes is writing in it. And apparently the solution that i gave before is cool for debugging but it's not able to determine when the program it's finished, which in fact make a lot of sense. I'm flooded with work right now, so hope that this can help someone fixing this bug. The best regards, and the best of lucks, SwatchPuppy I have a try on the code that you've given. During the startup stage, the resource manager shows a fine connection with the MPD server. However, once I try to start a debug session, it would tell me that it cannot connect to the MPD server. But it's a good point to start with. Thanks (In reply to comment #3) > I've got a little bit more time to think it trough, i believe that the thread > that it's trying to access the nodes info, it's doing it earlier than the > thread that it's writing the nodes pointer onto the array of nodes is writing > in it. > And apparently the solution that i gave before is cool for debugging but it's > not able to determine when the program it's finished, which in fact make a lot > of sense. > > I'm flooded with work right now, so hope that this can help someone fixing this > bug. > > > The best regards, and the best of lucks, > > SwatchPuppy It looks like this is a race condition between when the debug job is launched and when the process information is updated. It's more common in the MPICH case because the mpdlistjobs command is only run periodically. I think the solution to wait for the process information is correct, but it needs to be interruptable and not block the UI. I'll work on a fix. I've committed a fix for this to HEAD. I don't have MPICH installed, so I can't test it. If you could test asap that would be appreciated. I am sorry but where can I download your fix? Thanks. BTW which parallel environment are you working with? There will be a new build tonight at 10pm. Builds are available from http://wiki.eclipse.org/PTP/builds/4.0.0 I work mainly with Open MPI and PE. Thanks! When I was installing Eclipse 3.6, the debugger can't be built through "sh BUILD". The "libmi" stuff is missing from the directory. configure: WARNING: You must have XMLTO to compile the XML documentation for libaif. configure: creating ./config.status config.status: creating Makefile config.status: creating doc/Makefile config.status: creating config.h config.status: config.h is unchanged config.status: executing depfiles commands Making install in libaif make[2]: Nothing to be done for `install-exec-am'. make[2]: Nothing to be done for `install-data-am'. Making install in libmi /bin/sh: line 0: cd: libmi: No such file or directory make: *** [install-recursive] Error 1 Sorry about that. Please try the latest build: http://www.eclipse.org/downloads/download.php?file=/tools/ptp/builds/helios/I.I201006051912/ptp-master-4.0.0-I201006051912.zip Thanks, Greg The problem persists when I was using sh BUILD to compile the sdm debugger. Should I use an older version of sdm to debug? Thank you very much for your help. Max (In reply to comment #10) > Sorry about that. Please try the latest build: > http://www.eclipse.org/downloads/download.php?file=/tools/ptp/builds/helios/I.I201006051912/ptp-master-4.0.0-I201006051912.zip > > Thanks, > Greg After a reinstallation of the entire eclipse. The sdm debugger has been compiled. The job was launched in the mpd server, but the eclipse can't connect to it. I will investigate into this matter in the meanwhile. Thank you Max As I have said earlier jobs can be detected by mpdlistjobs through terminal but eclipse can't detect them. The pop-up window when I started the debugging job shows "waiting for job information..." and stuck with it... I decide to use OpenMPI instead to continue my work, but if there is any further patches I am pleased to try them. Thank you Greg and Swatch Puppy for your kind support. Best regards, Max (In reply to comment #10) > Sorry about that. Please try the latest build: > http://www.eclipse.org/downloads/download.php?file=/tools/ptp/builds/helios/I.I201006051912/ptp-master-4.0.0-I201006051912.zip > > Thanks, > Greg Fixed in 4.0 and HEAD. |