Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 573567

Summary: signing service unreliable
Product: Community Reporter: Christian Dietrich <christian.dietrich.opensource>
Component: ServersAssignee: Eclipse Webmaster <webmaster>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: denis.roy, mikael.barbero
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Mac OS X   
Whiteboard:

Description Christian Dietrich CLA 2021-05-17 01:06:28 EDT
in the recent days many of our nightly Xtext sign and deploy jobs fail with problems on the signing service

https://ci.eclipse.org/xtext/job/releng/job/sign-and-deploy/992/console
https://ci.eclipse.org/xtext/job/releng/job/sign-and-deploy/990/console
https://ci.eclipse.org/xtext/job/releng/job/sign-and-deploy/983/console

with

Process 'command 'curl'' finished with non-zero exit value 92

restarting the job often helps.

is there anything that can be done to make this more stable?

Thanks
Comment 1 Christian Dietrich CLA 2021-05-20 00:54:10 EDT
any update here?
Comment 2 Mikaël Barbero CLA 2021-05-20 02:41:05 EDT
I see once again errors from TSA

Server returned HTTP response code: 400 for URL: http://sha256timestamp.ws.symantec.com/sha256/timestamp

One way to mitigate this would be to implement https://github.com/eclipse-cbi/org.eclipse.cbi/issues/27. This item is not planned yet.
Comment 3 Christian Dietrich CLA 2021-05-20 02:42:14 EDT
as i dont see the problems in reruns do we run the job at a bad time in night or do the outages happen all over the day?
Comment 4 Mikaël Barbero CLA 2021-05-20 08:39:28 EDT
Here is the list of previously recorded TSA downtime (UTC)

2021-05-03 06:38:48
2021-05-03 06:51:36
2021-05-05 06:29:52
2021-05-05 between 17:12:21 and 17:17:04
2021-05-06 09:29:05
2021-05-06 15:15:36
2021-05-06 22:55:30
2021-05-06 23:02:29
2021-05-07 00:33:39
2021-05-07 03:17:40
2021-05-07 08:20:39
2021-05-07 08:48:15
2021-05-09 06:49:05
2021-05-10 09:09:04
2021-05-10 15:11:19
2021-05-11 06:53:38
2021-05-11 07:41:09
2021-05-11 08:46:27
2021-05-11 11:15:57
2021-05-11 14:49:59
2021-05-11 15:13:58
2021-05-11 15:17:05
2021-05-11 15:25:57
2021-05-11 22:41:17
2021-05-12 06:27:14
2021-05-12 08:18:05
2021-05-12 13:14:54
2021-05-12 14:17:43
2021-05-12 20:55:16
2021-05-13 22:14:46
2021-05-14 01:02:27
2021-05-14 04:17:34
2021-05-14 11:37:46
2021-05-15 23:12:00
2021-05-16 22:29:23
2021-05-16 20:23:38
2021-05-16 22:29:23
2021-05-17 04:23:30
2021-05-17 16:33:36
2021-05-17 between 22:26:33 and 22:38:19
2021-05-20 between 01:53:39 and 01:55:16

(note that timestamps in your Jenkins logs, e.g. https://ci.eclipse.org/xtext/job/releng/job/sign-and-deploy/983/console are in EDT, ie UTC-4)

You see that most of the downtime are episodic and a single retry usually makes it unnoticeable to projects. The 3 longer downtime matches the issues you've been experiencing. 

Again, we may be able to mitigate this with https://github.com/eclipse-cbi/org.eclipse.cbi/issues/27 and/or you could increase your curl retry number (currently --retry 3) or increase the delay between retry (currently 10 seconds --retry-delay 10). I'd advise to do even better and completely remove the --retry-delay parameter and make curl use its exponential backoff algorithm:

              When  curl is about to retry a transfer, it will first wait one second and then for all forthcoming retries it will double the waiting time until it reaches 10 minutes which then will be the delay between the rest of
              the retries.  By using --retry-delay you disable this exponential backoff algorithm. See also --retry-max-time to limit the total time allowed for retries.

With a --retry count set to 8 or 10, you should be able to alleviate those issues altogether without any penalty.
Comment 5 Christian Dietrich CLA 2021-05-20 10:40:22 EDT
ok, i have inced the count 10 and removed the deploy config.
lets see what happens
Comment 6 Denis Roy CLA 2021-12-20 11:58:18 EST
Assuming fixed. Please reopen if not.