| Summary: | Intermittant CVS pserver access issues | ||
|---|---|---|---|
| Product: | Community | Reporter: | Neil Skrypuch <ns03ja> |
| Component: | CVS | Assignee: | Eclipse Webmaster <webmaster> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | major | ||
| Priority: | P3 | CC: | Ed.Merks, give.a.damus, Kenn.Hussey, nboldt, pelder.eclipse, sonia_dimitrov |
| Version: | unspecified | ||
| Target Milestone: | --- | ||
| Hardware: | PC | ||
| OS: | Linux | ||
| Whiteboard: | |||
|
Description
Neil Skrypuch
This is an issue with our SLES 10 upgrade, as node1 (upgraded) was put into production with other not-yet-upgraded servers. I'll look into this immediately. D. *** Bug 164137 has been marked as a duplicate of this bug. *** In the short term, would it help to use nodeN.eclipse.org (where N > 1 and N <= 4) instead of dev.eclipse.org when accessing cvs? Another related problem in bug 164143. (In reply to comment #3) > In the short term, would it help to use nodeN.eclipse.org (where N > 1 and N <= > 4) instead of dev.eclipse.org when accessing cvs? > Absolutely not. We're working on the problem, but if we cannot solve it in a timely fashion, we have the ability to remove specific servers from the load balancer. Please, never ever ever ever ever ever ever ever ever connect to a specific node (not that you can, because they're inaccessible from the firewall, but it's worth stating). We need the ability to pull any node offline at any given time. I believe we caught this... can you please try again? Sure, but I have a question. During our promote, at one point, I scp a script to *.eclipse.org, then ssh in to run it; at other points I scp a zipfile to *.eclipse.org, then ssh in to unpack it. Over the last couple weeks, we've had problems with this since the scp and ssh steps hit different nodes and thus I get file not found errors trying to unpack or run in the second part of the steps. Worse, if I scp the build folder to nodeA and then ssh in to verify the md5sums all work and that the scp was done completely, I get promote failures because the check is looking at nodeB but the data is sitting on nodeA. Is there a way to guarantee that I scp and ssh to the same node within a set of script steps? NOTE: When we move our builds to build.eclipse.org, we may still have this problem but there, I'll be able to pull instead of push, since the promote script will be running on dev.eclipse and scping FROM build.eclipse (instead of the current scenario where it pushes from the build server to download1/dev/nodeX.eclipse.org). In that new setup, all the code and artefacts will be in the same ssh session, and there's only ONE build.eclipse.org (AFAIK). (In reply to comment #7) > scp and ssh steps hit different nodes and thus I get file not found errors > trying to unpack or run in the second part of the steps. Worse, if I scp the > build folder to nodeA and then ssh in to verify the md5sums all work and that > the scp was done completely, I get promote failures because the check is > looking at nodeB but the data is sitting on nodeA. Surely you jest. All the locations you have write access to (except for /tmp and /shared on build) are NFS mounted and common across all nodes and build. Where are you scp/ssh'ing your files? > Is there a way to guarantee that I scp and ssh to the same node within a set of > script steps? There shouldn't be any need to, because if you SSH anywhere to the /home or your user directory, it's common across all servers. If you scp/ssh to a specific node, your script will break when we take said node down for whatever maintenance. Well, scp'ing *temporary* stuff (zips to be unpacked, scripts to run then be deleted) have traditionally been scp'd to /tmp, which has worked seamlessly until about 3 weeks ago. I can change scripts to use ~/tmp instead, since the node-switcher no longer gives me the same node from one minute to the next (and thus gets a different /tmp folder). Ah, I see. Because we're doing server upgrades, we've been taking nodes out of service here-and-there, hence the load balancer is being a bit more "random". It really is a smart machine. Your plan to scp to ~/tmp is what will guarantee success no matter what node is in- our out-of-service. But all this is unrelated to comment 0, isn't it? CVS access seems to be better now, I haven't seen the original issue for awhile now. Resolving, I'll reopen if I notice the problem again. ...and for some reason, I didn't get email notification for comments #5 or #6, plus a few CC changes around that time. Denis: I've updated all our scripts, and will try a promote shortly. Thanks for the info about /tmp and node mirroring. And yes, this is all tangentially related to the original bug as reported. Our builds use dev.eclipse.org for cvs but were using node1.eclipse.org for promoting builds, so it's all related to mirroring and node1's eccentric performance over the last couple days. ;-) This is great, thanks guys. Let me know if I can anything else to help. D. |