Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 335208

Summary: [ds] component instantiated twice (static, immediate, servicefactory=false)
Product: [Eclipse Project] Equinox Reporter: Ronny Völker <ronny.voelker>
Component: CompendiumAssignee: Stoyan Boshev <s.boshev>
Status: RESOLVED FIXED QA Contact:
Severity: normal    
Priority: P3 CC: s.boshev, tjwatson
Version: unspecified   
Target Milestone: Juno M5   
Hardware: All   
OS: All   
Whiteboard:
Attachments:
Description Flags
proposed patch
none
proposed patch 2 none

Description Ronny Völker CLA 2011-01-24 11:07:34 EST
Build Identifier: 1.2.0.v20100507

A static immediate component is sometimes instantiated twice at startup.
It is providing one service with servicefactory=false.
The configuration-policy is required and the ConfigAdminService is providing one configuration for its pid.


Reproducible: Sometimes

Steps to Reproduce:
I haven't found a test case to reproduce it reliably, but when the problem occurred last in my development environment, I was able to deduce the general sequence of actions which lead to this behaviour (a ConfigurationEvent occurs before a BundleEvent.STARTED):

* SCRManager.configurationEvent() // BundleEvent.STARTED
  * SCRManager.addEvent() // put into SCRManager.queue
* SCRManager.bundleChanged(BundleEvent.STARTED)
  * SCRManager.startedBundle()
    * comp.enabled = true
    * SCRManager.enqueueWork() // put into SCRManager.queue

* SCRManager.processConfigurationEvent()
  * SCRManager.resolver.enableComponents(components) // because 
    of comp.enabled = true
    * Resolver.map() // creates a new ServiceComponentProp
      and adds it to SCRManager.scpEnabled
* SCRManager.QueuedJob().dispatch()
  * SCRManager.resolver.enableComponents(components)
    * Resolver.map() // creates a second ServiceComponentProp 
      for the same component configuration and adds it to SCRManager.scpEnabled

* any subsequent call to SCRManager.buildNewlySatisfied() will find two ServiceComponentProps for the component in SCRManager.scpEnabled and will try to create one instance for each SCP, when all dependencies are available.
Comment 1 Stoyan Boshev CLA 2012-01-10 12:02:49 EST
Created attachment 209273 [details]
proposed patch

Ronny, Thanks for reporting this hard to reproduce bug.

I made a thorough code review for the possible reasons. I found two possible reasons. (1) is very alike to what you described.

1. One possible reason is when the Configuration Admin bundle is started after the DS bundle. Then there were two possibilities in method SCRManager.configAdminRegistered() that could lead to thread unsafe management of DS components based on their configurations. In the attached patch I have fixed this by moving these actions in the dispatcher thread that is meant to do this kind of processing.

2. Another quite possible reason is double processing of the components of a bundle. The method SCRManager.startedBundle() is not thread safe now. It can be invoked twice for one and the same bundle by methods SCRManager.startIt() and SCRManager.bundleChanged(). The attached patch solves this situation as well.

Could you verify the provided patch in your environment if it is still reproducible?
Comment 2 Ronny Völker CLA 2012-01-12 05:41:26 EST
Sorry, it is still reproducible, but it seems to happen less often.

Without the patch, the bug occurs every 5-10 equinox-starts. After applying the patch, it occurred only once until now.

The Problem is, that I'm able to reproduce it only in my Eclipse-IDE, when I'm manually starting Equinox using an OSGi-Framework Launch-Configuration. I failed to reproduce the bug using a script, which perpetually starts and stops an independent Equinox-Instance. In production we have a workaround, which prevents this bug from occurring.

Stoyan, I think, you should try to reproduce the bug yourself in a more unit-test-like manner, using a more controlled environment, with only the minimum number of classes involved and minimum concurrency, where you are in control of the execution order.
I don't know how I can help you further with my current approach, to debug a real application running in a real Equinox-platform with all it concurrent flows of execution.

Btw. my workaround for the bug is, to avoid ConfigurationEvents (by not creating or updating Configurations in my application), until FrameworkEvent.STARTED occurred.
Comment 3 Stoyan Boshev CLA 2012-01-12 09:43:30 EST
Thanks for your feedback Ronny.
This is a timing problem and I know from experience I may loose lots of time to reproduce it. In this case the variables that model the behavior are: OS, PC hardware, set of bundles, order of bundles, config Admin after DS bundle, DS components specifics.

After looking again at the code I found another possible case that you are probably experiencing. I think in your setup the ConfigAdmin bundle starts after the DS bundle. In this case the method SCRManager.configAdminRegistered() is innoked and it may register some components to be enabled and they might already be in the queue for enablement (e.g. when the component's bundle has started a while ago). I thought a bit how to handle this situation and I think the simplest way would be to avoid mapping new components in a central place i.e. the Resolver.enableComponents() method. This fix should cover many similar situations with double enablement of components. I will add an updated patch in a while. Could you please verify in your environment? I will much appreciate it.
Comment 4 Stoyan Boshev CLA 2012-01-12 09:44:53 EST
Created attachment 209381 [details]
proposed patch 2
Comment 5 Ronny Völker CLA 2012-01-13 04:54:07 EST
During lot of start and stop clicks in Eclipse, the bug didn't occur again :). 

We should take this result with a grain of salt. 
However, I think that your latest scenario is the one, I tried to describe in the original bug description, and I agree that your new patch should solve this scenario.

With unit-test-like manner I meant, that you just use a single-threaded unit-test and mock up the method-calls you would expect. In this way, you can abstract away all the variables, you mentioned. But I see, that even this is hard with the current code base and is also not applicable to all scenarios. Some of the scenarios, which are real concurrency-problems, can only be reproduced with integration-tests.
Comment 6 Stoyan Boshev CLA 2012-01-13 08:10:51 EST
Thanks a lot for your time and support Ronny!
I have released the patch in HEAD.