Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 164132

Summary: Intermittant CVS pserver access issues
Product: Community Reporter: Neil Skrypuch <ns03ja>
Component: CVSAssignee: Eclipse Webmaster <webmaster>
Status: RESOLVED FIXED QA Contact:
Severity: major    
Priority: P3 CC: Ed.Merks, give.a.damus, Kenn.Hussey, nboldt, pelder.eclipse, sonia_dimitrov
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Linux   
Whiteboard:

Description Neil Skrypuch CLA 2006-11-10 10:57:54 EST
Since at least early this morning, CVS pserver access has been hit or miss. Sometimes it's fine, other times it fails with an error like this:

cvs [update aborted]: unrecognized auth response from dev.eclipse.org:   -> main: Session ID is 71c045549e6b4567
cvs [log aborted]: unrecognized auth response from dev.eclipse.org:   -> main: Session ID is 71c145549e6b4567

Often times the same operations will work minutes afterwards, only to stop working again minutes after that.
Comment 1 Denis Roy CLA 2006-11-10 11:16:26 EST
This is an issue with our SLES 10 upgrade, as node1 (upgraded) was put into production with other not-yet-upgraded servers.  I'll look into this immediately.

D.
Comment 2 Nick Boldt CLA 2006-11-10 11:48:12 EST
*** Bug 164137 has been marked as a duplicate of this bug. ***
Comment 3 Nick Boldt CLA 2006-11-10 12:06:13 EST
In the short term, would it help to use nodeN.eclipse.org (where N > 1 and N <= 4) instead of dev.eclipse.org when accessing cvs?

Comment 4 Nick Boldt CLA 2006-11-10 12:32:30 EST
Another related problem in bug 164143.
Comment 5 Denis Roy CLA 2006-11-10 13:38:05 EST
(In reply to comment #3)
> In the short term, would it help to use nodeN.eclipse.org (where N > 1 and N <=
> 4) instead of dev.eclipse.org when accessing cvs?
> 

Absolutely not.  We're working on the problem, but if we cannot solve it in a timely fashion, we have the ability to remove specific servers from the load balancer.

Please, never ever ever ever ever ever ever ever ever connect to a specific node (not that you can, because they're inaccessible from the firewall, but it's worth stating).  We need the ability to pull any node offline at any given time.
Comment 6 Denis Roy CLA 2006-11-10 13:42:46 EST
I believe we caught this... can you please try again?
Comment 7 Nick Boldt CLA 2006-11-10 15:33:16 EST
Sure, but I have a question.

During our promote, at one point, I scp a script to *.eclipse.org, then ssh in to run it; at other points I scp a zipfile to *.eclipse.org, then ssh in to unpack it. Over the last couple weeks, we've had problems with this since the scp and ssh steps hit different nodes and thus I get file not found errors trying to unpack or run in the second part of the steps. Worse, if I scp the build folder to nodeA and then ssh in to verify the md5sums all work and that the scp was done completely, I get promote failures because the check is looking at nodeB but the data is sitting on nodeA.

Is there a way to guarantee that I scp and ssh to the same node within a set of script steps? 

NOTE:

When we move our builds to build.eclipse.org, we may still have this problem but there, I'll be able to pull instead of push, since the promote script will be running on dev.eclipse and scping FROM build.eclipse (instead of the current scenario where it pushes from the build server to download1/dev/nodeX.eclipse.org). In that new setup, all the code and artefacts will be in the same ssh session, and there's only ONE build.eclipse.org (AFAIK).
Comment 8 Denis Roy CLA 2006-11-10 15:40:20 EST
(In reply to comment #7)
> scp and ssh steps hit different nodes and thus I get file not found errors
> trying to unpack or run in the second part of the steps. Worse, if I scp the
> build folder to nodeA and then ssh in to verify the md5sums all work and that
> the scp was done completely, I get promote failures because the check is
> looking at nodeB but the data is sitting on nodeA.

Surely you jest. All the locations you have write access to (except for /tmp and /shared on build) are NFS mounted and common across all nodes and build.  Where are you scp/ssh'ing your files?


> Is there a way to guarantee that I scp and ssh to the same node within a set of
> script steps? 

There shouldn't be any need to, because if you SSH anywhere to the /home or your user directory, it's common across all servers. If you scp/ssh to a specific node, your script will break when we take said node down for whatever maintenance.
Comment 9 Nick Boldt CLA 2006-11-10 15:54:14 EST
Well, scp'ing *temporary* stuff (zips to be unpacked, scripts to run then be deleted) have traditionally been scp'd to /tmp, which has worked seamlessly until about 3 weeks ago. I can change scripts to use ~/tmp instead, since the node-switcher no longer gives me the same node from one minute to the next (and thus gets a different /tmp folder).
Comment 10 Denis Roy CLA 2006-11-10 15:58:42 EST
Ah, I see. Because we're doing server upgrades, we've been taking nodes out of service here-and-there, hence the load balancer is being a bit more "random".  It really is a smart machine.

Your plan to scp to ~/tmp is what will guarantee success no matter what node is in- our out-of-service.

But all this is unrelated to comment 0, isn't it?
Comment 11 Neil Skrypuch CLA 2006-11-10 16:20:20 EST
CVS access seems to be better now, I haven't seen the original issue for awhile now. Resolving, I'll reopen if I notice the problem again.

...and for some reason, I didn't get email notification for comments #5 or #6, plus a few CC changes around that time.
Comment 12 Nick Boldt CLA 2006-11-10 17:21:52 EST
Denis:

I've updated all our scripts, and will try a promote shortly. Thanks for the info about /tmp and node mirroring. And yes, this is all tangentially related to the original bug as reported. Our builds use dev.eclipse.org for cvs but were using node1.eclipse.org for promoting builds, so it's all related to mirroring and node1's eccentric performance over the last couple days. ;-)
Comment 13 Denis Roy CLA 2006-11-10 19:42:37 EST
This is great, thanks guys.  Let me know if I can anything else to help.

D.