Community
Participate
Working Groups
Given this outcome of bug 361707, especially comment 361707#c38, I'm wondering if there should be some improved "test of the backup system"? Please take this as a minor suggestion ... I do not mean to overly add to anyone's workload or exaggerate one incident ... but ... backups are pretty important. I sounds like perhaps the backups were missing for a few months? In any case, I was wondering if there should be some simple "test" or "report" of the backups ... perhaps the ISP could provide a one page summary of "number of files backed up by server name or top level directories" or something, and it be "published" on http://archive.eclipse.org/arch/, or somewhere. Think anomalies would be noticeable from that? If that's no good, maybe just a weekly, monthly, or quarterly "test of the emergency broadcast system"? (I say that, since, here in the States, anyway, the TV stations are required to run those tests once a month or so, which you normally only see if you stay up till 3 AM like I do, and half the time they fail :/ ... like no sound or sound is too low or message flashes for just 3 seconds, or something). Anyway ... just suggestions ... feel free to close as "won't fix" if you think there is nothing worth while to do here. (I know at home, I back up frequently, for years, and never once have tested if its accurate ... the one time I really needed one of those backups, I had made the mistake of buying Vista Home Edition, which did not even backup JSP (and other) files, by design ... I should have been testing it, I guess! ... but, not sure I could have ever thought to devise a test to spot that weirdness.
I never remember how to type in short-hand comment references, such as for https://bugs.eclipse.org/bugs/show_bug.cgi?id=361707#c38 perhaps prefaced with bug instead of comment, such as bug 361707#c38
Yes, and yes. And yes. We've been using the same managed backup system since 2005. Since then, we've been receiving an email each month to remind us to test our restores. For the first few years, we did. We even brought in a blank box to simulate the case where fire would have destroyed all our hardware. But as the years passed, testing our restores went to the backburner. What came out was always what went in. Even in September our restores were tested -- they just didn't include code in the SCMs. This broke when we recently swapped the NFS servers and changed some mount points. We receive a nightly report that shows us the size of the backup. Since the daily diffs in the code repos is very, very small compared to the diffs of everything else, no red flags were raised. There was still a ton of files being backed up on dev.eclipse.org. Anyway, we've been doing everything you've suggested, more or less religiously, and that has obviously failed, so I'm open to suggestions. Perhaps in our test restore we should select files what we know should always be available in a restore (such as one directory in each SCM).
Sounds like you have it covered, and this was just an hard-to-catch case that slipped though. I think you are right to add some specific tests that would have caught this particular case to your test restore ... just like we do with our code ... if a regression slips in, we try and add a junit test that would have caught the regression, just to make sure it doesn't happen again. (And, honestly, we are not that good about doing that. :) Thanks for the info.
(In reply to comment #1) > I never remember how to type in short-hand comment references, such as for > > https://bugs.eclipse.org/bugs/show_bug.cgi?id=361707#c38 > > perhaps prefaced with bug instead of comment, such as > > bug 361707#c38 bug 12345 comment 4 should work
> Sounds like you have it covered, and this was just an hard-to-catch case that > slipped though. David, I'll close this as WORKSFORME. We'll be deploying a brand-new backups solution -- one which will give us more control -- so I'm hoping to write some automated backup tests.