Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 315773

Summary: content.xml is too big (20 MB+)
Product: [Eclipse Project] Equinox Reporter: Krzysztof Daniel <krzysztof.daniel>
Component: p2Assignee: P2 Inbox <equinox.p2-inbox>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: irbull, jeffmcaffer, john.arthorne, krzysztof.daniel, pascal, sptaszkiewicz
Version: 3.4.2   
Target Milestone: 3.7 M3   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on:    
Bug Blocks: 324906, 328826    
Attachments:
Description Flags
A support for compressing all content.xml's
none
Patch for 3.4
none
Support for compressing ExtensionLocationMetadataRepository
none
untested patch
none
patch none

Description Krzysztof Daniel CLA 2010-06-04 10:47:10 EDT
The path to the file is usually like this:

configuration\org.eclipse.osgi\bundles\74\data\1175256916\content.xml.

If the product is installed from an update site on first launch, this file can grow quite big. It has very similar content to .profile files. I believe it should be gzipped as the latter one.
Comment 1 Krzysztof Daniel CLA 2010-06-07 06:00:30 EDT
Created attachment 171243 [details]
A support for compressing all content.xml's
Comment 2 Krzysztof Daniel CLA 2010-06-07 06:05:29 EDT
Created attachment 171244 [details]
Patch for 3.4
Comment 3 Pascal Rapicault CLA 2010-06-07 08:55:18 EDT
Not all repositories should be compressed. You need to make sure that only the repository you are interested in is compressed.
Comment 4 Krzysztof Daniel CLA 2010-06-07 08:56:55 EDT
Why not all? I would like to compress all big repositories... Is there any way to distinguish them? Which part of code should I check?
Comment 5 Szymon Ptaszkiewicz CLA 2010-06-21 03:30:20 EDT
(In reply to comment #3)
> Not all repositories should be compressed. You need to make sure that only the
> repository you are interested in is compressed.

Pascal, why shouldn't we compress all repositories?
Comment 6 Pascal Rapicault CLA 2010-06-21 21:59:32 EDT
Because p2's ability to produce non compressed repo is relied upon by people and having compressed repo all the time would be a breaking API change.
Comment 7 Szymon Ptaszkiewicz CLA 2010-06-22 04:58:56 EDT
(In reply to comment #6)
> Because p2's ability to produce non compressed repo is relied upon by people
> and having compressed repo all the time would be a breaking API change.

Pascal, how can we distinguish which repo can be compressed and which cannot? If we cannot compressed all repos, how can we reduce the size of content.xml that are not going to be compressed?
Comment 8 Pascal Rapicault CLA 2010-06-22 08:43:12 EDT
Repositories can be compressed upon their creation. In this case, this is likely to be the repository created by the dropins. I suggest you fix the bug just there. Though you should be aware that this will then require the file to be uncompressed which could probably affect the performance.
Comment 9 John Arthorne CLA 2010-06-22 09:04:44 EDT
I believe these are extension location repositories - see ExtensionLocationMetadataRepositoryFactory#create.

I'm not convinced we really need to compress these. If you have an extension location repository that is 20MB, then the install is likely quite large anyway - the size of the metadata is quite small compared to the size of the actual bundle content. Unlike the profile files, we don't create a new unique copy of these repositories on every install - we always reuse the same repository so it is not growing over time.

As Pascal said, compressing is a speed/space tradeoff. Compression will make loading/storing/reconciling these repositories slower.
Comment 10 Szymon Ptaszkiewicz CLA 2010-06-23 04:43:05 EDT
(In reply to comment #9)
> I believe these are extension location repositories - see
> ExtensionLocationMetadataRepositoryFactory#create.
> 
> I'm not convinced we really need to compress these. If you have an extension
> location repository that is 20MB, then the install is likely quite large anyway
> - the size of the metadata is quite small compared to the size of the actual
> bundle content. Unlike the profile files, we don't create a new unique copy of
> these repositories on every install - we always reuse the same repository so it
> is not growing over time.
> 
> As Pascal said, compressing is a speed/space tradeoff. Compression will make
> loading/storing/reconciling these repositories slower.

The problem is more serious when we have shared Eclipse installation. Each user has his own configuration folder inside .eclipse and content.xml files are created and stored for each user separately. In such case, when there are many users, the total size of all repositories may be significant. I would expect to reduce the size of those files either by compressing them or by excluding some information from the files. I think that compressing would be better. What do you think?
Comment 11 Pascal Rapicault CLA 2010-06-24 23:22:01 EDT
> The problem is more serious when we have shared Eclipse installation. Each user
> has his own configuration folder inside .eclipse and content.xml files are
> created and stored for each user separately. 
   Could you please mention the path where you see this file?
   How does this file come to be?
Comment 12 Szymon Ptaszkiewicz CLA 2010-07-08 04:17:38 EDT
(In reply to comment #11)
> > The problem is more serious when we have shared Eclipse installation. Each user
> > has his own configuration folder inside .eclipse and content.xml files are
> > created and stored for each user separately. 
>    Could you please mention the path where you see this file?
>    How does this file come to be?

Pascal, the path given in comment 1 is correct but it is not full. The full path for my installation is:

C:\Documents and Settings\normal_user\.eclipse\org.eclipse.platform_3.5.0_2146121555\configuration\org.eclipse.osgi\bundles\74\data\-151330471\content.xml

I prepared steps that can be used to reproduce this problem:
1. Create user account in your OS with admin privileges (referred as admin).
2. Create user account in your OS with limited privileges (referred as normal_user).
3. Log in as admin.
4. Install product from an update site (e.g. to c:\eclipse, referred as ECLIPSE_HOME)
5. Change security rights for ECLIPSE_HOME in the following way:
   a. allow normal_user to read, traverse and execute any file or subfolder from ECLIPSE_HOME
   b. deny normal_user to create files/folders, write data and delete files/folders in ECLIPSE_HOME
6. Log out and log in as normal_user.
7. Start product from ECLIPSE_HOME -> .eclipse folder is created in normal_user home folder containing the file mentioned above.

This content.xml file has the following first three lines:

<?xml version='1.0' encoding='UTF-8'?>
<?metadataRepository version='1.1.0'?>
<repository name='extension location metadata repository: file:/C:/eclipse/.eclipseextension' type='org.eclipse.equinox.internal.p2.metadata.repository.LocalMetadataRepository' version='1'>

The file will be created for every user who will try to start product. If we have a lot of additional plugins in ECLIPSE_HOME and a lot of users we will get big (over 20 MB) content.xml files for each user. I think that compressing would reduce disk space used by those files significantly. Pascal, do you agree?
Comment 13 Szymon Ptaszkiewicz CLA 2010-07-08 04:19:33 EDT
(In reply to comment #12)
Of course, the path was mentioned previously in comment 0 not 1.
Comment 14 Szymon Ptaszkiewicz CLA 2010-07-08 06:40:06 EDT
Created attachment 173758 [details]
Support for compressing ExtensionLocationMetadataRepository
Comment 15 Szymon Ptaszkiewicz CLA 2010-07-27 03:56:31 EDT
Pascal, John what do you think about the patch?
Comment 16 Szymon Ptaszkiewicz CLA 2010-09-01 04:53:09 EDT
There is no target set for this bug. Is it possible to fix this in 3.7?
Comment 17 DJ Houghton CLA 2010-09-01 10:38:22 EDT
The patch compresses (with the setting of a System property) all extension location repositories and I believe that Pascal wasn't convinced this is a good idea. Also this requires a System property to be set by the client or the product.

I think a better approach would be to change the code in the Activator class of the reconciler.dropins bundle. There are helper methods for creating both types of repositories and we could set the "compressed" flag there.
Comment 18 DJ Houghton CLA 2010-09-01 10:48:08 EDT
Created attachment 177963 [details]
untested patch
Comment 19 Pascal Rapicault CLA 2010-09-01 11:07:34 EDT
DJ's observations are right on
Comment 20 Szymon Ptaszkiewicz CLA 2010-09-01 11:14:27 EDT
(In reply to comment #19)
> DJ's observations are right on

Great. I will test if the patch provided by DJ works in my case.
Comment 21 DJ Houghton CLA 2010-09-01 11:16:00 EDT
FYI, one small change to the patch I would make would be to set the compressed property before adding the properties that the user has passed in, just in case the user over-rode that property in their map.
Comment 22 Szymon Ptaszkiewicz CLA 2010-09-01 11:19:33 EDT
Thanks for the hint. I will update the patch accordingly.
Comment 23 DJ Houghton CLA 2010-09-01 11:44:44 EDT
The patch didn't work on my first test. I'll have to step through the code to determine why.
Comment 24 DJ Houghton CLA 2010-09-01 12:12:34 EDT
I've looked at the code a bit and the repositories are created by RepositoryListener#initializeMetadataRepository. A patch would probably have to be in the DropinsRepositoryListener subclass and involve a bit of refactoring, perhaps pass some default repository properties to the superclass constructor, etc. I'll work on a new patch.
Comment 25 DJ Houghton CLA 2010-09-01 15:37:09 EDT
Created attachment 178000 [details]
patch

New patch which has modifications in the DropinsRepositoryListener and RepositoryListener. The RepositoryListener class is in a provisional API package but we discussed it and came to the agreement that it is ok to change things. If necessary we can add the old constructor back but we should be fine.

Give this new patch a try and see how it works for you.
Comment 26 Szymon Ptaszkiewicz CLA 2010-09-09 09:44:54 EDT
(In reply to comment #25)
> Created an attachment (id=178000) [details]
> patch
> 
> New patch which has modifications in the DropinsRepositoryListener and
> RepositoryListener. The RepositoryListener class is in a provisional API
> package but we discussed it and came to the agreement that it is ok to change
> things. If necessary we can add the old constructor back but we should be fine.
> 
> Give this new patch a try and see how it works for you.

Sorry for late response.

Thanks DJ, the new patch works fine. I have tested it and p2 created content.jar instead of content.xml reducing over 90% of disk space usage.
Comment 27 DJ Houghton CLA 2010-09-09 10:38:45 EDT
Patch released.
Comment 28 DJ Houghton CLA 2010-09-16 12:18:49 EDT
There was a problem with tagging HEAD so this fix won't appear in integration builds until the first build after 3.7 M2.
Comment 29 Jeff McAffer CLA 2010-10-27 20:51:35 EDT
(In reply to comment #6)
> Because p2's ability to produce non compressed repo is relied upon by people
> and having compressed repo all the time would be a breaking API change.

A little off topic but since the format of repo files (even the existance of repo files) is not API we should be able to compress them, swizzle them, invert them, fold and staple them as we see fit.