Some Eclipse Foundation services are deprecated, or will be soon. Please ensure you've read this important communication.

Bug 307587

Summary: NatureManager is NOT threadsafe causing incorrect responses to isNatureEnabled() (and others)
Product: [Eclipse Project] Platform Reporter: Min Idzelis <min123>
Component: ResourcesAssignee: Szymon Brandys <Szymon.Brandys>
Status: VERIFIED FIXED QA Contact:
Severity: critical    
Priority: P3 CC: browe, john.arthorne, kholdaway, malgorzata.tomczyk, rtaniwa, sptaszkiewicz
Version: 3.6.1   
Target Milestone: 3.7 M2   
Hardware: PC   
OS: Windows XP   
Whiteboard:
Bug Depends on:    
Bug Blocks: 325224, 325226, 325227    
Attachments:
Description Flags
Proposal v01
none
Proposal v02
none
Proposal v03
none
Fix v04
none
Concurrent isNatureEnabled test
none
Test of parallel importing project and checking isNatureEnabled.
none
Same test but in patch
none
Importer code
none
Fixed concurrent natures
none
Fixed concurrent natures none

Description Min Idzelis CLA 2010-03-30 17:32:05 EDT
Build Identifier: 

The class NatureManager is NOT threadsafe at all, causing incorrect values to be return for isNatureEnabled(). 

Scenerio: 2 (or more) job are trying are simultaneously asking if a project has a nature (isNatureEnabled()) soon after the project was created for the first time. This leads to a race condition in getEnabledNatures(). 

The real fix would make it so that all (not just isNatureEnabled) methods in that class should be thread-safe. 

Reproducible: Always
Comment 1 Szymon Brandys CLA 2010-08-26 07:19:18 EDT
Created attachment 177513 [details]
Proposal v01
Comment 2 Min Idzelis CLA 2010-08-26 11:37:20 EDT
I don't think the patch goes far enough. There are still a few places where unsynchronized access to variables is made. 

The read of buildersToNatures in #findNatureForBuilder. 
The read of descriptors in #lazyInitialize. 

It also seems to me that #configureNatures should be synchronized, since it performs several operations that should be atomic (configureNature/deconfigureNature pairs)

Possibly more, but that's at first glance.
Comment 3 Szymon Brandys CLA 2010-08-27 05:49:13 EDT
Created attachment 177601 [details]
Proposal v02
Comment 4 Szymon Brandys CLA 2010-08-27 08:26:01 EDT
Created attachment 177612 [details]
Proposal v03
Comment 5 John Arthorne CLA 2010-08-27 15:25:34 EDT
Created attachment 177643 [details]
Fix v04

The previous patch synchronized a bit too much. Nature configure/deconfigure is already protected by the workspace lock. All we need to do here is protected the data structures in NatureManager so that API methods that don't acquire the workspace lock don't run into trouble (such as IWorkspace#getNatureDescriptor(s) and the nature methods on IProject. Also, since nature configure/deconfigure calls client code we need to avoid unnecessary extra locks while calling them.
Comment 6 Szymon Brandys CLA 2010-08-30 07:00:12 EDT
Thanks John. I released the patch to HEAD. I made #lazyInitialize private. natureEnablements could be perhaps lazily initialized as other collections in NatureManger.
Comment 7 Malgorzata Janczarska CLA 2010-09-16 07:08:26 EDT
Reviewed by code inspection.
The fix should work fine, I've only added a minor bug 325425 for synchronization of one of the maps.
Comment 8 Robert Taniwa CLA 2010-09-17 00:30:12 EDT
Will this also get released to the 3.6.2 maintenance stream when it becomes available?
Comment 9 Szymon Brandys CLA 2010-09-17 04:47:21 EDT
(In reply to comment #8)
> Will this also get released to the 3.6.2 maintenance stream when it becomes
> available?
Yes.
Comment 10 Szymon Brandys CLA 2010-09-17 05:17:11 EDT
(In reply to comment #8)
> Will this also get released to the 3.6.2 maintenance stream when it becomes
> available?

See bug 325227.
Comment 11 Kevan Holdaway CLA 2010-12-14 08:06:25 EST
We are still seeing a problem in the 3.4.2 stream.  Either this fix doesn't work there or this fix didn't completely fix the thread safety.
Comment 12 Szymon Brandys CLA 2010-12-14 09:32:10 EST
(In reply to comment #11)
> We are still seeing a problem in the 3.4.2 stream.  Either this fix doesn't
> work there or this fix didn't completely fix the thread safety.

Are symptoms on 3.4.2 the same now as they were before the patch was ported back?
Comment 13 Kevan Holdaway CLA 2010-12-14 09:46:20 EST
(In reply to comment #12)
> (In reply to comment #11)
> > We are still seeing a problem in the 3.4.2 stream.  Either this fix doesn't
> > work there or this fix didn't completely fix the thread safety.
> 
> Are symptoms on 3.4.2 the same now as they were before the patch was ported
> back?

Yes basically.  If you import a faceted project this assertion will fail:
Assert.assertTrue(message, project.isNatureEnabled(FacetedProjectNature.NATURE_ID));
Comment 14 Szymon Brandys CLA 2010-12-15 08:11:04 EST
(In reply to comment #13)
> Yes basically.  If you import a faceted project this assertion will fail:
> Assert.assertTrue(message,
> project.isNatureEnabled(FacetedProjectNature.NATURE_ID));

FacetedProjectNature is an WST class, see org.eclipse.wst.common.project.facet.core.internal.FacetedProjectNature

According to Kevan the test looks this way:
- During setUp() a project with FacetedProjectNature is imported to the worksapce
- the test calls project.isNatureEnabled(FacetedProjectNature.NATURE_ID));
- tearDown() deletes the project

It happens sometimes during the test invocation. Moreover I think we could write a test for the Min's case and see it it causes problems.
Comment 15 Malgorzata Janczarska CLA 2010-12-15 13:21:13 EST
Created attachment 185251 [details]
Concurrent isNatureEnabled test

Kevin,
I tried to reflect your case in a jUnit test, but I failed to reproduce the problem.
This is how it works:

1. in setUp it imports a project that has some nature enabled (in my case JavaNature)
2. The test starts a large amount of threads (in my case 500)
3. Threads wait random time and ask for isNatureEnabled

The result was always correct. Can you suggest me what can I modify in my test to come closer to your situation?
How do you import your project? I used the same method that import wizard does. Is it possible that import of your project did not finish before you called isNatureEnabled?
Comment 16 Kevan Holdaway CLA 2010-12-15 15:43:09 EST
No actually, once the problem with natures occurs, it stays until you restart eclipse.  In other words, its a timing issue to get into the bad state, but once you are there, it lasts the duration of the session with eclipse.

Let me continue to watch our tests to see if it occurs again.  Perhaps that will narrow it down a bit more.
Comment 17 Malgorzata Janczarska CLA 2010-12-16 06:05:08 EST
Created attachment 185303 [details]
Test of parallel importing project and checking isNatureEnabled.

This is an updated version of previous test that would be more likely reflect your situation. I modified it so there are multiple imports of the project and straight after it's checking isNatureEnabled. I also used the FacetedProjectNature. It still failed to reproduce your problem, even with 500 parallel threads, so I suppose there are some more conditions that cause the failure. Do you have any other guesses what else is going on in your workspace that may have influence on isNatureEnabled?
Comment 18 Szymon Brandys CLA 2011-02-01 04:00:32 EST
(In reply to comment #17)
> Created attachment 185303 [details]
> Test of parallel importing project and checking isNatureEnabled.
> 
> This is an updated version of previous test that would be more likely reflect
> your situation. I modified it so there are multiple imports of the project and
> straight after it's checking isNatureEnabled. I also used the
> FacetedProjectNature. It still failed to reproduce your problem, even with 500
> parallel threads, so I suppose there are some more conditions that cause the
> failure. Do you have any other guesses what else is going on in your workspace
> that may have influence on isNatureEnabled?

Kevan, any news about the issue?
Comment 19 Kevan Holdaway CLA 2011-02-01 07:05:49 EST
Yes, we have seen the issue after the fix.  Its showed up in our automated tests.
Comment 20 Kevan Holdaway CLA 2011-02-01 07:07:10 EST
I have provided all the information we know as to how to reproduce it.  It occurs in new workspaces.
Comment 21 Szymon Brandys CLA 2011-02-01 07:28:31 EST
(In reply to comment #20)
> I have provided all the information we know as to how to reproduce it.  It
> occurs in new workspaces.

Not sure if you read the Gosia's comment. She created a test according to your description and this test does not fail. We need more details or a suggestion how to modify it, so it could fail like your tests.

My question about "news" was related to your comment 16, which was:
(In reply to comment #16)
> Let me continue to watch our tests to see if it occurs again.  Perhaps that
> will narrow it down a bit more.
Comment 22 Kevan Holdaway CLA 2011-02-01 07:43:00 EST
I saw it and chatted with them.  I suggested using larger projects (like a user would actually use).  Other than, its very difficult to reproduce.  We have only seen it a couple of times in hundreds of automated tests.
Comment 23 Szymon Brandys CLA 2011-02-01 07:57:36 EST
(In reply to comment #22)
> I saw it and chatted with them.  I suggested using larger projects (like a user
> would actually use).

Could we record such information in the bug report next time? Gosia did you applied the Kevan's suggestion in the test somehow?
Comment 24 Malgorzata Janczarska CLA 2011-02-01 08:14:14 EST
(In reply to comment #22)
> I saw it and chatted with them.  I suggested using larger projects (like a user
> would actually use).  Other than, its very difficult to reproduce.  We have only
> seen it a couple of times in hundreds of automated tests.
Kevan, could you attach a conclusion of this chat to this bug? I cannot find any record of it in my chat history.
I will create another version of patch containing larger projects, did you make any other suggestions?
Comment 25 Malgorzata Janczarska CLA 2011-02-02 04:20:18 EST
(In reply to comment #22)
> I suggested using larger projects (like a user would actually use).
I tried the same test with a fairly large project (>3.5MB) and even with 2000 simultaneous threads project.isNatureEnabled(nature) always returned true. I suppose there has to be some additional condition that makes your tests fail and we won't get anywhere before we discover it.
Comment 26 Kevan Holdaway CLA 2011-02-02 07:24:32 EST
(In reply to comment #25)
> (In reply to comment #22)
> > I suggested using larger projects (like a user would actually use).
> I tried the same test with a fairly large project (>3.5MB) and even with 2000
> simultaneous threads project.isNatureEnabled(nature) always returned true. I
> suppose there has to be some additional condition that makes your tests fail
> and we won't get anywhere before we discover it.

I don't have any additional information.  As with most thread safety issues, they are difficult to reproduce due to timing.  Perhaps the best we can do is review the code to make sure state is properly protected.  When you have a fix that you feel is safe based on code inspection, we can use that.  If you feel that your current fix is thread-safe (although it seems there have been exceptions found), we can use that until we find a better way to reproduce it.
Comment 27 Szymon Brandys CLA 2011-02-04 04:08:14 EST
(In reply to comment #26)
> I don't have any additional information.  As with most thread safety issues,
> they are difficult to reproduce due to timing.  Perhaps the best we can do is
> review the code to make sure state is properly protected.  

The added synchronization was already reviewed by two committers. However we could miss something. Gosia, please review the code and see if you can find anything suspected.

> When you have a fix
> that you feel is safe based on code inspection, we can use that.  If you feel
> that your current fix is thread-safe (although it seems there have been
> exceptions found), we can use that until we find a better way to reproduce it.

Of course. Gosia and I will review the current code again. If we can't find any problem with it, we will wait for this better way to reproduce the problem.
Comment 28 Malgorzata Janczarska CLA 2011-02-08 05:37:05 EST
Created attachment 188503 [details]
Same test but in patch

We can add this test to our nightly builds. It will be tested every day in different conditions, maybe it will somewhere in our test environment.
Comment 29 Min Idzelis CLA 2011-02-08 08:57:44 EST
Chiming in on the test. 

I don't think the test is very representative of an actual import that happens when using the import wizards, or of what the flow is in our product. 

Here are some modifications that should be made. 

Importing a project does not use the project.create() method that takes the project description. Rather, a plain project is created using the method that only takes a monitor. Then, (in case of .zip) the ZipLeveledStructureProvider imports the rest of the project. As part of this process, the project is refreshed at the end. This includes the newly import .project file that contains all information about natures. This notification is sent on a different thread than the one importing it. 

The problem lies that in our product, a thread sometimes starts checking the project before the project is fully imported, which implies that the .project may or may not have been processed/refreshed yet. Perhaps the value of the enabled natures was cached incorrectly. 

Modifications that should be made:
1) Use project.create(monitor), then use the ZipLeveledStructuredProvider to import a project. (attached is a utility) 

2) Create a thread that solely checks isNatureEnabled on the project that is being imported. The value should not be asserted until the import is actually complete, but the early calls would be there to ensure thread-safety and cache consistency. 

--
Note on utility. 
Intended to run in a PDE UI plugin test like this:

IRunnableWithProgress wait = ZipImportUtil.createImportRunnable(PROJECT_NAME, BUNDLE_ID);
PlatformUI.getWorkbench().getProgressService().busyCursorWhile(wait);

This will spin off the file import from the main UI thread. 

Note: You do not need to delete the project when running it 2 times in a row. It automatically deletes the project if it is present. 

--
Comment 30 Min Idzelis CLA 2011-02-08 08:58:58 EST
Created attachment 188516 [details]
Importer code
Comment 31 Malgorzata Janczarska CLA 2011-02-22 10:34:55 EST
Created attachment 189502 [details]
Fixed concurrent natures

Min, thank you for your input. I managed to reproduce the problem. It occurred in 3.4 but in 3.7 as well.
We were flushing the natures cache on every product description change, that was correct. The problem was that the event was send before the description update have been finished. If isNatureEnabled() check was made after sending the event but before the description import was finished the previous natures set was cached again.
Attached patch contains fix for 3.7.
Comment 32 Billy Rowe CLA 2011-02-22 11:36:12 EST
Would it be possible to get a fix in earlier streams, e.g. 3.6x?
Comment 33 Malgorzata Janczarska CLA 2011-02-22 12:58:23 EST
Created attachment 189520 [details]
Fixed concurrent natures

Minor correction to previous patch
Comment 34 Billy Rowe CLA 2011-02-23 14:42:14 EST
My request above asking for patches in earlier streams is related to these earlier bugzillas:
3.6.2: https://bugs.eclipse.org/bugs/show_bug.cgi?id=325227
3.4.2: https://bugs.eclipse.org/bugs/show_bug.cgi?id=325224

We could use a patch for the latest 3.4.+ and 3.6.+ streams

Thanks
Billy
Comment 35 Szymon Brandys CLA 2011-02-24 03:43:01 EST
Gosia, please open a new bug for this and attach the patch. This bug is about NatureManager threadsafety which was indeed fixed. The new issue is a separate problem.
Comment 36 Malgorzata Janczarska CLA 2011-02-24 04:03:15 EST
Created Bug 338055 for the seperate problem.