Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 571802

Summary: open vsx is slow again
Product: Community Reporter: Anton Kosyakov <anton>
Component: InfrastructureAssignee: Eclipse Webmaster <webmaster>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: denis.roy, mikael.barbero
Version: unspecified   
Target Milestone: ---   
Hardware: PC   
OS: Mac OS X   
Whiteboard:

Description Anton Kosyakov CLA 2021-03-09 04:35:19 EST
Gitpod cannot connect to it. Status page again does not recognise any slowness.
It is quite disturbing since users cannot install extensions for new workspaces.
Comment 1 Mikaël Barbero CLA 2021-03-09 05:09:26 EST
Could you please elaborate the "cannot connect to it"? Thanks.
Comment 2 Anton Kosyakov CLA 2021-03-09 05:21:22 EST
Around one hour ago users were getting timeout errors while trying to search the Open VSX from VS Code integration in Gitpod. Ending up with `We cannot connect to the Extensions Marketplace at this time, please try again later.` user facing error.
Comment 3 Mikaël Barbero CLA 2021-03-09 05:46:16 EST
Do you an idea of how long and how many requests were "slow"? I see half a dozen of timeout from the logs, but nothing more.

Also, this was during searches, correct? Not direct queries for extensions?
Comment 4 Anton Kosyakov CLA 2021-03-09 06:42:58 EST
> Do you an idea of how long and how many requests were "slow"? I see half a dozen of timeout from the logs, but nothing more.

Unfortunately, we don't collect such information yet. It only logged in the client browser for now. As it is directly talks to Open VSX.

> Also, this was during searches, correct? Not direct queries for extensions?

Yes, it is about using Open VSX api. If a user manages to a link to Azure Storage then it works.
Comment 5 Mikaël Barbero CLA 2021-03-09 07:51:35 EST
I noticed that the ES nodes were spending a lot of time in GC. I've increased the memory allocated to the nodes (see https://github.com/EclipseFdn/open-vsx.org/commit/31fbf071dd4073bbb89b5d83e866c98ad9c4b3b6). 

Let's keep this one open for a week. Feel free to add any other issue you may have during this time. If we don't see the issue during this period, I'll close then the ticket.
Comment 6 Anton Kosyakov CLA 2021-03-09 10:03:44 EST
We are still seeing errors like 

Failed to find extension 'tyriar.sort-lines@1.9.0:39TPtnzbLWepVuVCVtlO3g==' in 'https://open-vsx.org' registry:","error":"Error: ESOCKETTIMEDOUT\n    at ClientRequest.<anonymous> (/app/node_modules/request/request.js:816:19)

Configured timeout is 5 mins.
Comment 7 Mikaël Barbero CLA 2021-03-09 10:21:58 EST
I've opened https://github.com/eclipse/openvsx/issues/256 as this is now the only log remaining.
Comment 8 Mikaël Barbero CLA 2021-03-09 10:23:22 EST
@webmaster, 

meanwhile, could you please have a look at the nginx side of open-vsx.org?
Comment 9 Anton Kosyakov CLA 2021-03-09 10:32:13 EST
sorry, actually I was wrong timeout is 5s not 5mins on our side: https://github.com/gitpod-io/gitpod/blob/48dfd9faae8ed6b21684365eb26837065e1adc79/components/server/src/theia-plugin/theia-plugin-service.ts#L244
Comment 10 Denis Roy CLA 2021-03-09 10:32:35 EST
Could the two issues not be related?  If the server is exhausting a 30s timeout to the database (posibly because of connection pool exhaustion) wouldn't that lead a client to a timeout?
Comment 11 Denis Roy CLA 2021-03-09 10:34:20 EST
> sorry, actually I was wrong timeout is 5s not 5mins on our side:

Thanks; while we strive to get sub-second response times in all cases, 5 seconds isn't very forgiving in the odd case of congestion or other contention.
Comment 12 Anton Kosyakov CLA 2021-03-09 10:35:33 EST
> Thanks; while we strive to get sub-second response times in all cases, 5 seconds isn't very forgiving in the odd case of congestion or other contention.

What timeout would you recommend? 30s is enough?
Comment 13 Denis Roy CLA 2021-03-10 09:51:21 EST
30s should be plenty. If >1s becomes normal, we'll investigate the cause.
Comment 14 Mikaël Barbero CLA 2021-03-16 09:23:55 EDT
From deeper analysis, issue is on the app side (too many heavy SQL requests). 

There is an effort to denormalize the DB schema to lighten the load. See https://github.com/eclipse/openvsx/issues/261

We're still seeing very low cache hit rate (as reported in https://github.com/eclipse/openvsx/issues/214) but we still have no docker image to deploy with the latest changes.
Comment 15 Denis Roy CLA 2021-03-16 10:11:50 EDT
(In reply to Denis Roy from comment #13)
> 30s should be plenty. If >1s becomes normal, we'll investigate the cause.

Indeed, we are seeing 9s ~ 14s response times from anything that cannot be cached/accelerated.


A cacheable call responds very quickly:

$ time wget -S https://open-vsx.org/api/rust-lang/rust
HTTP request sent, awaiting response... 
  HTTP/1.1 200 
  Server: nginx
  [snip]
  X-Frame-Options: DENY
  X-Proxy-Cache: HIT
real    0m0.281s


A non-cacheable call, with an Origin header, cannot be cached:

$ time wget -S --header="Origin: x" https://open-vsx.org/api/rust-lang/rust
HTTP request sent, awaiting response... 
  HTTP/1.1 200 
  Server: nginx
  [snip]
  X-Frame-Options: DENY
  X-Proxy-Cache: BYPASS
real    0m7.538s

Thx for the summary, Mikaël.
Comment 16 Denis Roy CLA 2021-12-21 12:06:55 EST
We have resolved this.