Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 224269

Summary: Startup takes 20 times longer with P2 enabled
Product: [Eclipse Project] Equinox Reporter: David Williams <david_williams>
Component: p2Assignee: P2 Inbox <equinox.p2-inbox>
Status: RESOLVED FIXED QA Contact:
Severity: blocker    
Priority: P3 CC: david_williams, deboer, jeffmcaffer, philippe_mulet, simon_kaegi
Version: unspecified   
Target Milestone: 3.4 M6   
Hardware: PC   
OS: Linux   
Whiteboard:
Bug Depends on:    
Bug Blocks: 223238    
Attachments:
Description Flags
patch
none
zip of replacement bundles none

Description David Williams CLA 2008-03-26 22:05:40 EDT
We in WTP are having trouble getting our JUnit tests to run on build.eclipse.org (PPC arch) because they take a LOT longer for each suite to startup. Below are some "before and after" measurements. If I'm looking at the numbers right, this adds 4 minutes to each test suite ... we have about 80 

Hence, we add at least at hour, and the testing cycle times out. (currently set at 2 hours). 

... and many suites themselves take less than a second to run ... many less than a minute ... a few a long time. 

The URLs below point to whole lists of consolelogs captured during the tests, if you wanted to look at several. This happened to be the "DTD UI" suite, picked arbitrarily. 

We don't seem to see this effect on Linux i86 (we don't do unit tests on windows). 

Is this to be expected? Or a sign of a bug in the PPC distribution? 


pre p2 (I20080305-1100)
http://build.eclipse.org/webtools/committers/wtp-R3.0-I/20080324125023/I-I20080324125023-20080324125023/testResults/consolelogs/testLogs/logIndex.php

Time to load bundles: 87
Starting application: 12804
Application Started: 21333


p2 (I20080325-2233)
http://build.eclipse.org/webtools/committers/wtp-R3.0-I/20080326171340/I-I20080326171340-20080326171340/testResults/consolelogs/testLogs/logIndex.php

Time to load bundles: 4445
Starting application: 261347
Application Started: 269453
Comment 1 David Williams CLA 2008-03-26 23:59:31 EDT
I take that back about not being such a difference on linux x86 ... I hadn't moved up to most recent P2 SDK on that box. Once I moved up, the slowdown there was significant also. 



Comment 2 David Williams CLA 2008-03-27 00:46:18 EDT
If not obvious, we run our JUnit's with WTP and all our pre-reqs extracted to 'dropins' folder. Anything simple we can do on our end to speed this up? Remove dependancy versions from features? I realize that if we install (once) with P2 operations, we'd not have this problem ... but ... am hoping for some quick and easy way to get through this week's JUnit tests (which is needed on Thursday for our weekly declared build). 

 
Comment 3 John Arthorne CLA 2008-03-27 00:51:41 EDT
This is largely due to support we added this week for performing "best-effort" installation from the dropins folder, with resilience to unsatisfied dependencies, multiple versions, etc. We haven't had time to optimize this yet (bug 224270). 
Comment 4 John Arthorne CLA 2008-03-27 01:01:33 EDT
Note that it is only the first startup after adding content to dropins that is slow, but unfortunately when running tests you typically add new things to dropins for each test run so this isn't much reassurance.
Comment 5 David Williams CLA 2008-03-27 01:09:06 EDT
(In reply to comment #4)
> Note that it is only the first startup after adding content to dropins that is
> slow, but unfortunately when running tests you typically add new things to
> dropins for each test run so this isn't much reassurance.
> 

Yes, actually we don't ... we dump it all and then run through all the tests (we've always intended to break them up, make them more precise, but haven't yet). 

But, we do run each test in a new workspace, if that matters. 

Comment 6 John Arthorne CLA 2008-03-27 01:16:06 EDT
Ok, that offers some hope - there may be an optimization we're missing and it's doing too much work on subsequent starts. With the Eclipse platform test framework each set of tests is dropped into a clean, new Eclipse install to avoid the possibility of old tests contaminating future runs.
Comment 7 David Williams CLA 2008-03-27 10:13:50 EDT
(In reply to comment #6)
> Ok, that offers some hope - there may be an optimization we're missing and it's
> doing too much work on subsequent starts. With the Eclipse platform test
> framework each set of tests is dropped into a clean, new Eclipse install to
> avoid the possibility of old tests contaminating future runs.
> 

I just confirmed, the very first test that runs has roughly same numbers as subsequent runs. So ... either I'm completely mistaken about how our tests run, or the basic resolution info isn't being cached. (or, is being cached in the workspace, which wouldn't seem right). 

First test (releng Test Build): 

Time to load bundles: 4703
Starting application: 288143

Pre P2
Time to load bundles: 105
Starting application: 215367

So ... think if I follow the instructions at 
http://wiki.eclipse.org/Equinox_p2_reverting_to_UM
(first three steps) that the startup time will also revert to previous levels? 
Comment 8 David Williams CLA 2008-03-27 11:13:03 EDT
Just another thought ... I'm sure you've already thought of it, but just in case your heads are spinning from lack of sleep :) ... I wonder if the increased time could be related to re-reading the manifests a lot, which are now signed, and hence take a long time to read? If so, some "manifest cache" might help. 

Comment 9 Simon Kaegi CLA 2008-03-27 15:20:05 EDT
Created attachment 93848 [details]
patch

This patch should reduce the impact on startup time for the second and later runs. In this patch we check the repository timestamps and profiles timestamps to see if we have to re-synch the profile in the ProfileSynchronizer. This lets us avoid the plan creation and executions steps for the typical re-use use-cases.

I'll attach the replacement bundles to go along with the I20080327-0100 build.
Comment 10 Simon Kaegi CLA 2008-03-27 15:21:21 EDT
Created attachment 93849 [details]
zip of replacement bundles
Comment 11 David Williams CLA 2008-03-27 17:33:55 EDT
I tried this fix in my local system, and the startup was substantially faster, 10 times faster, roughly. 

before fix
Time to load bundles: 1562
Starting application: 331938
Application Started:  369172

after fix
Time to load bundles:  125
Starting application:  17797
Application Started:   37203

Which makes me think our Junit's will be able to complete in a reasonable time. 

But, I won't know for sure until late tonight or early AM. 
Comment 12 Simon Kaegi CLA 2008-03-27 18:18:38 EDT
I've released the patch to HEAD. We can leave this bug open until we get back the result of your (DW) tests.
Comment 13 David Williams CLA 2008-03-27 22:43:00 EDT
Tests will run soon, but wanted to change severity, since this bug's "blocks" bug was closed. I consider this bug as blocking, since obviously if we can't complete our JUnit tests, we couldn't declare a milestone ... and, we never did get them to run all the way through with the unfixed code. (Not to mention what a pain it is to wait 360 seconds for my workspace to come up, instead of 30 seconds!). 

I'm optimistic, but ... just wanted to do this bookkeeping. 

Also, I've noticed, and IM'd briefly with Simon about ... if -clean is specified, the extra very very long start up time happens again. Obviously, we'd want some way to "clean" (reset) this cached timestamp data, but I wonder if it should be some other parameter, in order to maintain similar backwards compatible behavior. 
As it it ... I've had to remove -clean from all our JUnit test scripts (and am hoping it doesn't introduce a lot of errors :( 

Should these be a new bug? Or, can it be consider part of this same issue? 


Comment 14 David Williams CLA 2008-03-28 03:21:43 EDT
Applying these patches (along with the patch in bug 224289) did allow our JUnit tests to complete, so I consider this bug fixed ... at least, the blocking nature of the slow down. 

It is still slower, even after I removed -osgi.clean=true (which the comparison runs below did not have ... we used to always use osgi.clean-true, to minimize influence between tests. 

To give some very rough data, before P2, our tests took about 73 minutes. With P2, the total time was about 99 minutes. So ... I'm guessing your startup performance tests, even with this fix, with show about a 35% regression in startup time. 

But ... at least that's containable and allows us to run. Thanks very much for this fix! 

Comment 15 David Williams CLA 2008-03-28 04:23:17 EDT
(In reply to comment #14)
> ... so I consider this bug fixed ... 
> 

Just to be explicit ... I mean fixed, as long as it gets in M6 :) 

Comment 16 John Arthorne CLA 2008-03-28 08:24:26 EDT
Thanks David! We should be able to find more opportunities for improvement in M7. Our fixes were released for the build last night.
Comment 17 John Arthorne CLA 2008-03-28 08:38:16 EDT
I have opened bug 224579 to track further performance enhancements in M7 on first startup after installing new bundles.
Comment 18 David Williams CLA 2008-03-28 08:58:45 EDT
(In reply to comment #16)
> Thanks David! We should be able to find more opportunities for improvement in
> M7. Our fixes were released for the build last night.
> 

I assume that's the I20080327-2251 build? We'll try that, since we need a fix for 224423 anyway.