| Summary: | Weird "too many files open" error editing file on orion.eclipse.org | ||
|---|---|---|---|
| Product: | [ECD] Orion | Reporter: | Michael Rennie <Michael_Rennie> |
| Component: | Client | Assignee: | Silenio Quarti <Silenio_Quarti> |
| Status: | RESOLVED FIXED | QA Contact: | |
| Severity: | blocker | ||
| Priority: | P2 | CC: | curtis.windatt.public, Michael_Rennie, remy.suen, snorthov, xinyij |
| Version: | 13.0 | ||
| Target Milestone: | 15.0 | ||
| Hardware: | PC | ||
| OS: | Mac OS X | ||
| Whiteboard: | |||
| Bug Depends on: | |||
| Bug Blocks: | 514343 | ||
|
Description
Michael Rennie
I also saw this error on my page before reloading and finding orion.eclipse.org down. Bumping the priority on this one - it is a failure that brought down the server (loss of data and all). We need to investigate how it could have happened. There is a library available that can make fs more resilient to this kind of failure: https://www.npmjs.com/package/graceful-fs Its too late to add this in 14.0, but it would be worth investigating in 15.0 This happens rarely, and as advised by Micheal will try to use graceful-fs after Orion 14. We need to understand whether graceful-fs will fix us. It seems like it won't. Instead, it seems like we need "ulimit to the rescue": http://stackoverflow.com/questions/19981065/nodejs-error-emfile-too-many-open-files-on-mac-os The crash was on the server. Let's ensure that ulimit is huge there, close the bug and move on. SSQ can you make this change or should Mike? This is how we start orion.eclipse.org. Ulimit is already huge. ulimit -n 2000 ulimit -v 20000000 ulimit -c unlimited pm2 start server.js --name $HOST --log-date-format "YYYY-MM-DD HH:mm Z" -- -p $PORT -w $WORKSPACEHOME (In reply to Steve Northover from comment #5) > We need to understand whether graceful-fs will fix us. It seems like it > won't. Instead, it seems like we need "ulimit to the rescue": > > http://stackoverflow.com/questions/19981065/nodejs-error-emfile-too-many- > open-files-on-mac-os > > The crash was on the server. Let's ensure that ulimit is huge there, close > the bug and move on. > > SSQ can you make this change or should Mike? We could easily increase it, but we should be tolerant in general to these kinds of failures. graceful-fs will help us here because (one of the many things it does) is queue file requests in the event there are no more file handles, and then flush said queue when your app starts releasing them. I did a little more digging. This is the default values for ulimit (ulimit -a):
> ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 155961
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 155961
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
And this is the values when running orion.eclipse.org:
core file size (blocks, -c) unlimited
data seg size (kbytes, -d) unlimited
scheduling priority (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) 155961
max locked memory (kbytes, -l) 64
max memory size (kbytes, -m) unlimited
open files (-n) 2000
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) 819200
real-time priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) 155961
virtual memory (kbytes, -v) 20000000
file locks (-x) unlimited
We increase the open files limit from 1024 to 2000. I will increase this limit to 8096 to see whether this problem goes way.
Did you increate the limit? If so, close this bug. I will open another one to make use of graceful-fs. yes, the limit is increased. |