Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.
Bug 373455 - Add support for PBS on BG/P
Summary: Add support for PBS on BG/P
Status: RESOLVED FIXED
Alias: None
Product: PTP
Classification: Tools
Component: RM (show other bugs)
Version: 5.0.5   Edit
Hardware: All All
: P3 enhancement (vote)
Target Milestone: 6.0   Edit
Assignee: Greg Watson CLA
QA Contact:
URL:
Whiteboard:
Keywords:
Depends on:
Blocks:
 
Reported: 2012-03-06 19:33 EST by Kevin Huck CLA
Modified: 2012-06-20 09:06 EDT (History)
2 users (show)

See Also:


Attachments
proposed BG/P PBS batch support (70.43 KB, patch)
2012-03-06 19:40 EST, Kevin Huck CLA
g.watson: iplog+
Details | Diff
Screenshot of system monitor view for BGP (220.42 KB, image/png)
2012-03-13 14:19 EDT, Wyatt Spear CLA
no flags Details
Updated patch with BG/Q support and correct copyright. (26.98 KB, patch)
2012-04-26 13:35 EDT, Kevin Huck CLA
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this bug.
Description Kevin Huck CLA 2012-03-06 19:33:29 EST
Build Identifier: Version: 3.6.2 Build id: 20120216-1857

The ALCF BG/P resources at anl.gov are not supported with the generic PBS remote manager. The ALCF BG/P resources use PBS for job management, but use partlist instead of pbsnodes to get node information. In addition, the intrepid and challenger machines use Cryptocard unique password logins, and PTP does not play well with the crypto card / RSA one time passwords. This second problem can be worked-around by using managed connections, so that subsequent connections to the host are tunneled through the first connection (see ssh managed connections for details).

Reproducible: Always
Comment 1 Kevin Huck CLA 2012-03-06 19:40:03 EST
Created attachment 212180 [details]
proposed BG/P PBS batch support

This is a proposed patch for adding ALCF BG/P RM support (batch only). The proposed support does not include interactive support. The proposed patch generates a script file, but does not use it - all qsub parameters are passed on the command line (standard BG/P PBS method). More robust support would use the command line parameters where necessary, and put the rest in the script. This would enable environment variables to be supported more easily.
Comment 2 Greg Watson CLA 2012-03-07 10:22:46 EST
Is this specific to ACLF, or will it work with other BG/P systems using PBS?
Comment 3 Kevin Huck CLA 2012-03-07 10:47:01 EST
(In reply to comment #2)
> Is this specific to ACLF, or will it work with other BG/P systems using PBS?

I haven't tested it anywhere else, but the only system monitoring dependencies are qstat (from PBS) and partlist (which gives partition information). I don't know if ALCF made any customizations to PBS that this patch would have a dependency on (i.e. qsub parameters, qstat output).

One tricky part could be mapping jobs on partitions to jobs on the BG/P row/column/midplane/nodecard/node/core hardware hierarchy.

As far as I know, ALCF names their partitions in a standard BG/P way. Below is the description of the naming conventions that I got from William Scullin at ALCF system support:

<quote>

In the Blue Gene environment, a beowulf cluster / constellation system
style listing of nodes doesn't make a lot of sense. You'd have 40,960
compute node cards to list that are completely identical. Individual
compute node cards are enumerated within the hardware management
section of the control system, but are never addressed individually
elsewhere (except within an HTC mode block). We allocate and manage
nodes only as members of predefined blocks which fall along hardware
boundaries.

For our own graphical system monitor, the Gronkulator
(http://status.alcf.anl.gov/ or
http://status.alcf.anl.gov/intrepid/activity for Intrepid), we're
using bits of the cobalt API to get output from qstat, showres, and
partlist. We provide a secure and semi-processed feed of the same data
in json. ( The URI for the json feed follows the form
http://status.alcf.anl.gov/$lcf_system_name/activity.json ; for
intrepid it'd be http://status.alcf.anl.gov/intrepid/activity.json .)

If you're looking for an individual listing of nodes to visualize
performance statistics it is possible to map back nodes to hardware
locations, though the rank to hardware location mapping reported isn't
always consistant as mapping may vary due to partition size and the
BG_MAPPING environment variable. The hardware location information
from within personality is however consistent with block names as we
try to make them follow hardware naming conventions. There used to be
a great description of the mapping of the block to hardware but as a
quick summary block names follow the convention:

For blocks under 512 nodes:
<machine label>-R<row><column>-M<midplane>-N<first node card in
block>-<block size in compute node cards>

For blocks 512 nodes in size:
<machine label>-R<row><column>-M<midplane>-512

For blocks 1,024 nodes in size:
<machine label>-R<row><column>-1024

For blocks greater than 1,024 nodes:
<machine label>-R<starting row><starting column>-R<ending row><ending
column>-<blocksize>

On challenger there is one additional addition to block names -T< the
compute to ION ratio  if not the default 64:1> which comes directly
before <blocksize>.

Examples:

CHR-R00-M0-N04-T16-64 would be a 64 node block on Challenger, Row 0,
Column 0, Midplane 0, containing all the node cards needed to get to
64  compute nodes starting at node card N04 (N04,N05) with a ratio of
16 compute nodes to 1 io nodes. The individual compute node cards in
that block if looked at via personality (via
BGP_Personality_getLocationString ) would have names like
R00-M0-N05-J00 (J denotes an compute node with J00 and J01 being ions
and J04 through J35 being compute nodes)

ANL-R43-M1-N08-256 would be a 256 compute node block on Intrepid, Row
4, Column 3, Midplane 1, containing all the node cards needed to get
to 256 compute nodes starting at node card N08 ( and containing
N08,N09,N10,N11,N12,N13,N14,N15).

ANL-R00-M0-512 would be a 512 compute node block on Intrepid or
Surveyor, Row 0, Column 0, Midplane 0 containing all the node cards in
the midplane.

ANL-R20-R37-16384 would be a 16,384 compute node block on Intrepid
starting at Row 2, Column 0, containing all hardware through the very
last node in the rack at Row 3, Column 7.

Hardware naming and getting location information from personality is
covered in far more detail in the IBM Blue Gene /P Application
Developers Handbook in Appendixes A and C
(http://www.redbooks.ibm.com/abstracts/sg247287.html?Open&pdfbookmark).

</quote>
Comment 4 Greg Watson CLA 2012-03-07 11:17:35 EST
Ok, thanks.

Can you please add a statement to this bug stating the following:

1) You have authored 100% of the contributed code.

2) You have the right to donate the content to Eclipse.

3) You are donating the contribution under the Eclipse Public License.

Thanks!
Comment 5 Kevin Huck CLA 2012-03-07 12:20:57 EST
I have authored 100% of the contributed code, which is based on modifications to the previously existing generic PBS support.

I have the right to donate the content to Eclipse.

I am donating the contribution under the Eclipse Public License.

Kevin
Comment 6 Wyatt Spear CLA 2012-03-13 14:19:51 EDT
Created attachment 212585 [details]
Screenshot of system monitor view for BGP

This is how the system monitor UI presents ALCF's intrepid
Comment 7 Kevin Huck CLA 2012-04-19 22:17:01 EDT
What's the status of this patch? Since we submitted the patch, we have debugged the BG/P support and added support for BG/Q test systems at ALCF.
Comment 8 Greg Watson CLA 2012-04-20 10:15:27 EDT
The contribution is approved based on the copyright headers being appended with "and others" and Kevin Huck, ParaTools, being added to the Contributor Sections. 

Wyatt, please commit an updated version of the patch to the contrib plugin. Note that the name you use in the extension point must be the same as the name in the configuration xml. Thanks!
Comment 9 Kevin Huck CLA 2012-04-26 13:35:05 EDT
Created attachment 214620 [details]
Updated patch with BG/Q support and correct copyright.

Here is an updated patch for the bug. This patch fixes the copyright information for new files, and adds Kevin Huck as a contributor. This patch also includes support for BG/Q systems and minor bug fixes for the previous BG/P support.
Comment 10 Wolfgang Frings CLA 2012-05-29 11:09:15 EDT
Driver scripts are now integrated into LML_da 1.17. 

The driver name was changed to COBALT_BG according to the 
batch system name and platform.
Comment 11 Greg Watson CLA 2012-06-20 09:06:12 EDT
Closing as fixed.