Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 335364

Summary: [server] better structure for serverworkspace files
Product: [ECD] Orion Reporter: Denis Roy <denis.roy>
Component: ClientAssignee: John Arthorne <john.arthorne>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: nathan, wayne.beaton
Version: 0.2   
Target Milestone: 0.2   
Hardware: PC   
OS: Linux   
Whiteboard:

Description Denis Roy CLA 2011-01-25 13:55:47 EST
When an Orion server has many users, each user's directories are created as subdirectories under serverworkspace, like such:

orion:serverworkspace/ # ls -l
drwxr-xr-x  2 user users 4096 Jan 19 17:22 -
drwxr-xr-x  3 user users 4096 Jan 18 08:23 .metadata
drwxr-xr-x  2 user users 4096 Jan 14 07:42 7    <- belongs to user A
drwxr-xr-x  2 user users 4096 Jan  6 07:19 B    <- belongs to user B
drwxr-xr-x  2 user users 4096 Jan 20 09:24 BB   <- belongs to user A
drwxr-xr-x  2 user users 4096 Jan 20 18:53 BC   <- belongs to user A
drwxr-xr-x  2 user users 4096 Jan 24 11:43 BD   <- belongs to user C
drwxr-xr-x  2 user users 4096 Jan 24 11:58 BE   <- belongs to user D
drwxr-xr-x  2 user users 4096 Jan  6 10:43 F    <- belongs to user A
drwxr-xr-x  2 user users 4096 Jan 14 04:42 G    <- belongs to user D
drwxr-xr-x  2 user users 4096 Jan 14 05:26 H    <- belongs to user E
drwxr-xr-x  3 user users 4096 Jan 14 07:23 K    <- belongs to user F
drwxr-xr-x  9 user users 4096 Jan 14 07:12 L    <- belongs to user A
drwxr-xr-x  3 user users 4096 Jan 14 07:11 M    <- belongs to user B
... and do on

From an administration standpoint, this is unmanageable because:

a. all of one user's data is spread out across multiple directories under a common "root".  This makes it difficult to isolate a specific user's data (without parsing .prefs files), difficult to restrict disk space, difficult to perform backups and restores

b. not scalable.  If a million users each have 3 directories, the above serverworkspace would contain 3 million entries.


I suggest creating one container directory per user, and grouping users according to the first character of their ID:

serverworkspace/
|---------- 0
|---------- 1
|---------- 2
|---------- 3
..
|---------- a
|---------- b
|           `----- boris
|                  |----- MyStuff
|                  |----- Some Directory
|                  `----- Another Directory
|---------- c
|---------- d
|           `----- denisroy
|                  |----- folderA
|                  |----- folderB
|                  `----- folderC
..
|---------- A
|---------- B
|---------- C
...


The above strategy will facilitate storage expansion, scalability, backup, restore and restriction directly at the OS level.
Comment 1 John Arthorne CLA 2011-01-25 15:40:47 EST
To give some background, the default layout was initially kept "unaware" of users to allow for sharing scenarios. For example if 100,000 users all want to use Dojo it would be nice if they could share the same copy. This not only reduces disk usage, but allows us to optimize search indexing and other kinds of server-side analysis. Think for example of Google Docs where each document can have any number of authors if desired. We haven't exposed any UI to allow sharing of projects yet, but that's the reason the relationship between files and users isn't "baked in" to the design.

Having said that, our disk layout isn't set in stone and we should be able to come up with ways to offer that kind of customization. We actually support storing projects at arbitrary paths in the server file system if the server configuration allows it. However for what you're looking for, we would need some configurable policy settings on the server that dictate the layout of projects. The right configuration for 1,000,000 users is likely different from a "self-hosting" install where I'm running the server on my local machine.

Also just to set expectations, the current server implementation is not designed to scale to many thousands of users. There are some metadata files on the server that are shared across users today, which would need to migrate to some kind of database to get deep scalability. We could certainly split projects into a tree so there are a fixed number of projects per directory, but this isn't the only place we don't currently scale.
Comment 2 John Arthorne CLA 2011-02-07 17:00:41 EST
Done. There is now a configuration setting called "orion.file.layout" to alter the default project layout. Supported values so far:

"flat" - Same as today. All projects in one directory. This is suitable for a single-user installation of the server

"userTree" - Projects organized into a tree according to the user that created the project. The tree is of the form:

<first two letters of user id>/<user id>/<project id>

For more details see:

http://wiki.eclipse.org/Orion/Server_admin_guide#Configuring_project_layout
Comment 3 John Arthorne CLA 2011-02-07 17:01:10 EST
Changes pushed to git.eclipse.org.